The Python Oracle

How to convert a string to utf-8 in Python

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping

--

Chapters
00:00 Question
00:33 Accepted answer (Score 307)
01:16 Answer 2 (Score 83)
01:34 Answer 3 (Score 24)
01:56 Answer 4 (Score 16)
02:16 Thank you

--

Full question
https://stackoverflow.com/questions/4182...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python27 #unicode #utf8

#avk47



ACCEPTED ANSWER

Score 315


In Python 2

>>> plain_string = "Hi!"
>>> unicode_string = u"Hi!"
>>> type(plain_string), type(unicode_string)
(<type 'str'>, <type 'unicode'>)

^ This is the difference between a byte string (plain_string) and a unicode string.

>>> s = "Hello!"
>>> u = unicode(s, "utf-8")

^ Converting to unicode and specifying the encoding.

In Python 3

All strings are unicode. The unicode function does not exist anymore. See answer from @Noumenon




ANSWER 2

Score 84


If the methods above don't work, you can also tell Python to ignore portions of a string that it can't convert to utf-8:

stringnamehere.decode('utf-8', 'ignore')



ANSWER 3

Score 25


Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:

def make_unicode(inp):
    if type(inp) != unicode:
        inp =  inp.decode('utf-8')
    return inp



ANSWER 4

Score 16


Adding the following line to the top of your .py file:

# -*- coding: utf-8 -*-

allows you to encode strings directly in your script, like this:

utfstr = "γƒœγƒΌγƒ«γƒˆ"