The Python Oracle

Python pandas: flag duplicate rows

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2

--

Chapters
00:00 Python Pandas: Flag Duplicate Rows
00:25 Accepted Answer Score 13
00:41 Answer 2 Score 5
00:57 Thank you

--

Full question
https://stackoverflow.com/questions/4455...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #duplicates

#avk47



ACCEPTED ANSWER

Score 13


As per the docs use the keep argument and set as False. As you can see it defaults to first.

import pandas as pd

df = pd.DataFrame({'Column_A': ['AAA', 'AAB', 'AAB', 'AAC']})
df['duplicate'] = df.duplicated(keep=False)

print(df)

     Column_A  duplicate
0    'AAA'     False
1    'AAB'     True
2    'AAB'     True
3    'AAC'     False



ANSWER 2

Score 5


I imagine myself lost in the wilderness and all I have to survive is pd.factorize and np.bincount
Please, don't accept this answer

f, u = pd.factorize(df.Column_A.values)
df.assign(duplicate=np.bincount(f)[f] > 1)

  Column_A  duplicate
0      AAA      False
1      ABC       True
2      ABC       True