How to convert a string to utf-8 in Python
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay
--
Chapters
00:00 How To Convert A String To Utf-8 In Python
00:28 Accepted Answer Score 315
01:03 Answer 2 Score 84
01:15 Answer 3 Score 25
01:28 Answer 4 Score 16
01:45 Answer 5 Score 15
01:52 Thank you
--
Full question
https://stackoverflow.com/questions/4182...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python27 #unicode #utf8
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay
--
Chapters
00:00 How To Convert A String To Utf-8 In Python
00:28 Accepted Answer Score 315
01:03 Answer 2 Score 84
01:15 Answer 3 Score 25
01:28 Answer 4 Score 16
01:45 Answer 5 Score 15
01:52 Thank you
--
Full question
https://stackoverflow.com/questions/4182...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #python27 #unicode #utf8
#avk47
ACCEPTED ANSWER
Score 315
In Python 2
>>> plain_string = "Hi!"
>>> unicode_string = u"Hi!"
>>> type(plain_string), type(unicode_string)
(<type 'str'>, <type 'unicode'>)
^ This is the difference between a byte string (plain_string) and a unicode string.
>>> s = "Hello!"
>>> u = unicode(s, "utf-8")
^ Converting to unicode and specifying the encoding.
In Python 3
All strings are unicode. The unicode function does not exist anymore. See answer from @Noumenon
ANSWER 2
Score 84
If the methods above don't work, you can also tell Python to ignore portions of a string that it can't convert to utf-8:
stringnamehere.decode('utf-8', 'ignore')
ANSWER 3
Score 25
Might be a bit overkill, but when I work with ascii and unicode in same files, repeating decode can be a pain, this is what I use:
def make_unicode(inp):
if type(inp) != unicode:
inp = inp.decode('utf-8')
return inp
ANSWER 4
Score 16
Adding the following line to the top of your .py file:
# -*- coding: utf-8 -*-
allows you to encode strings directly in your script, like this:
utfstr = "γγΌγ«γ"