How do I get a list of all the duplicate items using pandas in python?
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Ancient Construction
--
Chapters
00:00 How Do I Get A List Of All The Duplicate Items Using Pandas In Python?
01:33 Accepted Answer Score 303
02:27 Answer 2 Score 286
02:48 Answer 3 Score 270
03:17 Answer 4 Score 19
03:23 Thank you
--
Full question
https://stackoverflow.com/questions/1465...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #duplicates
#avk47
    Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Ancient Construction
--
Chapters
00:00 How Do I Get A List Of All The Duplicate Items Using Pandas In Python?
01:33 Accepted Answer Score 303
02:27 Answer 2 Score 286
02:48 Answer 3 Score 270
03:17 Answer 4 Score 19
03:23 Thank you
--
Full question
https://stackoverflow.com/questions/1465...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #duplicates
#avk47
ACCEPTED ANSWER
Score 303
Method #1: print all rows where the ID is one of the IDs in duplicated:
>>> import pandas as pd
>>> df = pd.read_csv("dup.csv")
>>> ids = df["ID"]
>>> df[ids.isin(ids[ids.duplicated()])].sort_values("ID")
       ID ENROLLMENT_DATE        TRAINER_MANAGING        TRAINER_OPERATOR FIRST_VISIT_DATE
24  11795       27-Feb-12      0643D38-Hanover NH      0643D38-Hanover NH        19-Jun-12
6   11795        3-Jul-12  0649597-White River VT  0649597-White River VT        30-Mar-12
18   8096       19-Dec-11  0649597-White River VT  0649597-White River VT         9-Apr-12
2    8096        8-Aug-12      0643D38-Hanover NH      0643D38-Hanover NH        25-Jun-12
12   A036       30-Nov-11     063B208-Randolph VT     063B208-Randolph VT              NaN
3    A036        1-Apr-12      06CB8CF-Hanover NH      06CB8CF-Hanover NH         9-Aug-12
26   A036       11-Aug-12      06D3206-Hanover NH                     NaN        19-Jun-12
but I couldn't think of a nice way to prevent repeating ids so many times.  I prefer method #2: groupby on the ID.
>>> pd.concat(g for _, g in df.groupby("ID") if len(g) > 1)
       ID ENROLLMENT_DATE        TRAINER_MANAGING        TRAINER_OPERATOR FIRST_VISIT_DATE
6   11795        3-Jul-12  0649597-White River VT  0649597-White River VT        30-Mar-12
24  11795       27-Feb-12      0643D38-Hanover NH      0643D38-Hanover NH        19-Jun-12
2    8096        8-Aug-12      0643D38-Hanover NH      0643D38-Hanover NH        25-Jun-12
18   8096       19-Dec-11  0649597-White River VT  0649597-White River VT         9-Apr-12
3    A036        1-Apr-12      06CB8CF-Hanover NH      06CB8CF-Hanover NH         9-Aug-12
12   A036       30-Nov-11     063B208-Randolph VT     063B208-Randolph VT              NaN
26   A036       11-Aug-12      06D3206-Hanover NH                     NaN        19-Jun-12
ANSWER 2
Score 286
With Pandas version 0.17, you can set 'keep = False' in the duplicated function to get all the duplicate items.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(['a','b','c','d','a','b'])
In [3]: df
Out[3]: 
       0
    0  a
    1  b
    2  c
    3  d
    4  a
    5  b
In [4]: df[df.duplicated(keep=False)]
Out[4]: 
       0
    0  a
    1  b
    4  a
    5  b
ANSWER 3
Score 270
df[df.duplicated(['ID'], keep=False)]
it'll return all duplicated rows back to you.
According to documentation:
keep: {‘first’, ‘last’, False}, default ‘first’
- 'first' : Mark duplicates as True except for the first occurrence.
 - 'last' : Mark duplicates as True except for the last occurrence.
 - False : Mark all duplicates as True.
 
ANSWER 4
Score 19
df[df['ID'].duplicated() == True]
This worked for me