pandas - keep only True values after groupby a DataFrame
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Droplet of life
--
Chapters
00:00 Pandas - Keep Only True Values After Groupby A Dataframe
01:17 Accepted Answer Score 14
01:57 Answer 2 Score 1
02:15 Thank you
--
Full question
https://stackoverflow.com/questions/2885...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
    Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Droplet of life
--
Chapters
00:00 Pandas - Keep Only True Values After Groupby A Dataframe
01:17 Accepted Answer Score 14
01:57 Answer 2 Score 1
02:15 Thank you
--
Full question
https://stackoverflow.com/questions/2885...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 14
Assign the result of df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1) to a variable so you can perform boolean indexing and then use the index from this to call isin and filter your orig df:
In [366]:
users = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
users
Out[366]:
User_ID
189757330    False
222583401    False
287280509    False
329757763    False
414673119     True
624921653    False
Name: Datetime, dtype: bool
In [367]:   
users[users]
Out[367]:
User_ID
414673119    True
Name: Datetime, dtype: bool
In [368]:
users[users].index
Out[368]:
Int64Index([414673119], dtype='int64')
In [361]:
df[df['User_ID'].isin(users[users].index)]
Out[361]:
     User_ID   Latitude  Longitude            Datetime
5  414673119  41.555014   2.096583 2014-02-24 20:15:30
6  414673119  41.555014   2.097583 2014-02-24 20:16:30
7  414673119  41.555014   2.098583 2014-02-24 20:17:30
You can then call to_csv on the above as normal
ANSWER 2
Score 1
first, make sure you have no duplicate entries:
df = df.drop_duplicates()
then, figure out the counts for each:
counts = df.groupby('User_ID').Datetime.count()
finally, figure out where the indexes overlap:
df[df.User_ID.isin(counts[counts > 1].index)]