The Python Oracle

Inconsistency in saving and loading pandas dataframe with lists as values

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Horror Game Menu Looping

--

Chapters
00:00 Question
01:00 Accepted answer (Score 6)
02:21 Thank you

--

Full question
https://stackoverflow.com/questions/1898...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 7


No, it is not a bug. CSV files do not have datatype information. When you load the file, all read_csv has to go on is the text. When it sees [1, 2] in the file, it does not assume that it should process the contents as a list. (This is proper; a CSV file might contain text in that format that should not be a list.)

Direct Answer: If you want to turn the column back into a list, do df['c'] = df['c'].map(ast.literal_eval). (You must first import ast of course.) You could write this into a "converter" function to do it upon loading -- see the read_csv documentation.

Better Approach: Save your data as something other than a CSV so that the datatypes can be saved and recovered on loading. The simplest way to do this is to save as a binary file: df.to_pickle('test.df').

Big Picture: DataFrames or Series containing lists are unidiomatic: they aren't very convenient to deal with, and they don't make available most of pandas's nice tools for handling data. Think again about whether you really need your data as lists. (Maybe you do, but it should be a last resort.)