python: unicode problem
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: A Thousand Exotic Places Looping v001
--
Chapters
00:00 Python: Unicode Problem
01:14 Answer 1 Score 3
01:31 Accepted Answer Score 20
01:59 Answer 3 Score 11
02:10 Thank you
--
Full question
https://stackoverflow.com/questions/4735...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #unicode
#avk47
    Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: A Thousand Exotic Places Looping v001
--
Chapters
00:00 Python: Unicode Problem
01:14 Answer 1 Score 3
01:31 Accepted Answer Score 20
01:59 Answer 3 Score 11
02:10 Thank you
--
Full question
https://stackoverflow.com/questions/4735...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #unicode
#avk47
ACCEPTED ANSWER
Score 20
This looks like UTF-16 data. So try
data[0].rstrip("\n").decode("utf-16")
Edit (for your update): Try to decode the whole file at once, that is
data = open(...).read()
data.decode("utf-16")
The problem is that the line breaks in UTF-16 are "\n\x00", but using readlines() will split at the "\n", leaving the "\x00" character for the next line.
ANSWER 2
Score 11
This file is a UTF-16-LE encoded file, with an initial BOM.
import codecs
fp= codecs.open("a", "r", "utf-16")
lines= fp.readlines()
ANSWER 3
Score 3
EDIT
Since you posted 2.7 this is the 2.7 solution:
file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "replace") for line in file]
Ignoring undecodeable characters:
file = open("./Downloads/lamp-post.csv", "r")
data = [line.decode("utf-16", "ignore") for line in file]