Inconsistency in saving and loading pandas dataframe with lists as values

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Civilization

--

Chapters
00:00 Inconsistency In Saving And Loading Pandas Dataframe With Lists As Values
00:51 Accepted Answer Score 7
02:17 Thank you

--

Full question
https://stackoverflow.com/questions/1898...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47

ACCEPTED ANSWER

Score 7

No, it is not a bug. CSV files do not have datatype information. When you load the file, all read_csv has to go on is the text. When it sees [1, 2] in the file, it does not assume that it should process the contents as a list. (This is proper; a CSV file might contain text in that format that should not be a list.)

Direct Answer: If you want to turn the column back into a list, do df['c'] = df['c'].map(ast.literal_eval). (You must first import ast of course.) You could write this into a "converter" function to do it upon loading -- see the read_csv documentation.

Better Approach: Save your data as something other than a CSV so that the datatypes can be saved and recovered on loading. The simplest way to do this is to save as a binary file: df.to_pickle('test.df').

Big Picture: DataFrames or Series containing lists are unidiomatic: they aren't very convenient to deal with, and they don't make available most of pandas's nice tools for handling data. Think again about whether you really need your data as lists. (Maybe you do, but it should be a last resort.)