Python pandas: flag duplicate rows
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: The Builders
--
Chapters
00:00 Question
00:34 Accepted answer (Score 13)
00:55 Answer 2 (Score 5)
01:19 Thank you
--
Full question
https://stackoverflow.com/questions/4455...
Accepted answer links:
[docs]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #duplicates
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: The Builders
--
Chapters
00:00 Question
00:34 Accepted answer (Score 13)
00:55 Answer 2 (Score 5)
01:19 Thank you
--
Full question
https://stackoverflow.com/questions/4455...
Accepted answer links:
[docs]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #duplicates
#avk47
ACCEPTED ANSWER
Score 13
As per the docs use the keep argument and set as False. As you can see it defaults to first.
import pandas as pd
df = pd.DataFrame({'Column_A': ['AAA', 'AAB', 'AAB', 'AAC']})
df['duplicate'] = df.duplicated(keep=False)
print(df)
Column_A duplicate
0 'AAA' False
1 'AAB' True
2 'AAB' True
3 'AAC' False
ANSWER 2
Score 5
I imagine myself lost in the wilderness and all I have to survive is pd.factorize and np.bincount
Please, don't accept this answer
f, u = pd.factorize(df.Column_A.values)
df.assign(duplicate=np.bincount(f)[f] > 1)
Column_A duplicate
0 AAA False
1 ABC True
2 ABC True