Python pandas: flag duplicate rows
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2
--
Chapters
00:00 Python Pandas: Flag Duplicate Rows
00:25 Accepted Answer Score 13
00:41 Answer 2 Score 5
00:57 Thank you
--
Full question
https://stackoverflow.com/questions/4455...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #duplicates
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Puzzle2
--
Chapters
00:00 Python Pandas: Flag Duplicate Rows
00:25 Accepted Answer Score 13
00:41 Answer 2 Score 5
00:57 Thank you
--
Full question
https://stackoverflow.com/questions/4455...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #duplicates
#avk47
ACCEPTED ANSWER
Score 13
As per the docs use the keep argument and set as False. As you can see it defaults to first.
import pandas as pd
df = pd.DataFrame({'Column_A': ['AAA', 'AAB', 'AAB', 'AAC']})
df['duplicate'] = df.duplicated(keep=False)
print(df)
Column_A duplicate
0 'AAA' False
1 'AAB' True
2 'AAB' True
3 'AAC' False
ANSWER 2
Score 5
I imagine myself lost in the wilderness and all I have to survive is pd.factorize and np.bincount
Please, don't accept this answer
f, u = pd.factorize(df.Column_A.values)
df.assign(duplicate=np.bincount(f)[f] > 1)
Column_A duplicate
0 AAA False
1 ABC True
2 ABC True