The Python Oracle

pandas - keep only True values after groupby a DataFrame

This video explains
pandas - keep only True values after groupby a DataFrame

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Civilization

--

Chapters
00:00 Question
01:35 Accepted answer (Score 14)
02:38 Answer 2 (Score 1)
03:04 Thank you

--

Full question
https://stackoverflow.com/questions/2885...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 14


Assign the result of df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1) to a variable so you can perform boolean indexing and then use the index from this to call isin and filter your orig df:

In [366]:

users = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
users

Out[366]:
User_ID
189757330    False
222583401    False
287280509    False
329757763    False
414673119     True
624921653    False
Name: Datetime, dtype: bool

In [367]:   
users[users]

Out[367]:
User_ID
414673119    True
Name: Datetime, dtype: bool

In [368]:
users[users].index

Out[368]:
Int64Index([414673119], dtype='int64')

In [361]:
df[df['User_ID'].isin(users[users].index)]

Out[361]:
     User_ID   Latitude  Longitude            Datetime
5  414673119  41.555014   2.096583 2014-02-24 20:15:30
6  414673119  41.555014   2.097583 2014-02-24 20:16:30
7  414673119  41.555014   2.098583 2014-02-24 20:17:30

You can then call to_csv on the above as normal




ANSWER 2

Score 1


first, make sure you have no duplicate entries:

df = df.drop_duplicates()

then, figure out the counts for each:

counts = df.groupby('User_ID').Datetime.count()

finally, figure out where the indexes overlap:

df[df.User_ID.isin(counts[counts > 1].index)]