The Python Oracle

Truly deep copying Pandas DataFrames

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping

--

Chapters
00:00 Truly Deep Copying Pandas Dataframes
00:56 Accepted Answer Score 2
01:15 Answer 2 Score 1
01:46 Thank you

--

Full question
https://stackoverflow.com/questions/6631...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe #deepcopy

#avk47



ACCEPTED ANSWER

Score 2


One way is to convert df_in to Python dictionary which works better with copy:

def pop(df_in):
    df = pd.DataFrame(copy.deepcopy(df_in.to_dict()) )
    print(df['sets'].apply(lambda x: set([x.pop()])))

for i in range(3): pop(df)

Output:

0    {1}
Name: sets, dtype: object
0    {1}
Name: sets, dtype: object
0    {1}
Name: sets, dtype: object



ANSWER 2

Score 1


The problem is that your objects are mutable as they are sets. The documents explicitly call out this behavior with a warning (emphasis my own):

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object.

So as always with references to mutable objects, if you change it it affects it everywhere. We can see for ourselves despite the deep copy the objects have the same ID.

import pandas as pd
df = pd.DataFrame({'sets': [{1,2}]}, index=[0])
df1 = df.copy(deep=True)

id(df['sets'].iloc[0])
#4592957024

id1(df['sets'].iloc[0])
#4592957024