python: unicode problem
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Question
01:22 Accepted answer (Score 19)
01:57 Answer 2 (Score 10)
02:14 Answer 3 (Score 3)
02:38 Thank you
--
Full question
https://stackoverflow.com/questions/4735...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #unicode
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping
--
Chapters
00:00 Question
01:22 Accepted answer (Score 19)
01:57 Answer 2 (Score 10)
02:14 Answer 3 (Score 3)
02:38 Thank you
--
Full question
https://stackoverflow.com/questions/4735...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #unicode
#avk47
ACCEPTED ANSWER
Score 20
This looks like UTF-16 data. So try
data[0].rstrip("\n").decode("utf-16")
Edit (for your update): Try to decode the whole file at once, that is
data = open(...).read()
data.decode("utf-16")
The problem is that the line breaks in UTF-16 are "\n\x00", but using readlines() will split at the "\n", leaving the "\x00" character for the next line.
ANSWER 2
Score 11
This file is a UTF-16-LE encoded file, with an initial BOM.
import codecs
fp= codecs.open("a", "r", "utf-16")
lines= fp.readlines()
ANSWER 3
Score 3
EDIT
Since you posted 2.7 this is the 2.7 solution:
file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "replace") for line in file]
Ignoring undecodeable characters:
file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "ignore") for line in file]