The Python Oracle

Python pandas: flag duplicate rows

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: The Builders

--

Chapters
00:00 Question
00:34 Accepted answer (Score 13)
00:55 Answer 2 (Score 5)
01:19 Thank you

--

Full question
https://stackoverflow.com/questions/4455...

Accepted answer links:
[docs]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #duplicates

#avk47



ACCEPTED ANSWER

Score 13


As per the docs use the keep argument and set as False. As you can see it defaults to first.

import pandas as pd

df = pd.DataFrame({'Column_A': ['AAA', 'AAB', 'AAB', 'AAC']})
df['duplicate'] = df.duplicated(keep=False)

print(df)

     Column_A  duplicate
0    'AAA'     False
1    'AAB'     True
2    'AAB'     True
3    'AAC'     False



ANSWER 2

Score 5


I imagine myself lost in the wilderness and all I have to survive is pd.factorize and np.bincount
Please, don't accept this answer

f, u = pd.factorize(df.Column_A.values)
df.assign(duplicate=np.bincount(f)[f] > 1)

  Column_A  duplicate
0      AAA      False
1      ABC       True
2      ABC       True