pandas - keep only True values after groupby a DataFrame
This video explains
pandas - keep only True values after groupby a DataFrame
--
Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Civilization
--
Chapters
00:00 Question
01:35 Accepted answer (Score 14)
02:38 Answer 2 (Score 1)
03:04 Thank you
--
Full question
https://stackoverflow.com/questions/2885...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
pandas - keep only True values after groupby a DataFrame
--
Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Civilization
--
Chapters
00:00 Question
01:35 Accepted answer (Score 14)
02:38 Answer 2 (Score 1)
03:04 Thank you
--
Full question
https://stackoverflow.com/questions/2885...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 14
Assign the result of df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1) to a variable so you can perform boolean indexing and then use the index from this to call isin and filter your orig df:
In [366]:
users = df.groupby('User_ID')['Datetime'].apply(lambda g: len(g)>1)
users
Out[366]:
User_ID
189757330 False
222583401 False
287280509 False
329757763 False
414673119 True
624921653 False
Name: Datetime, dtype: bool
In [367]:
users[users]
Out[367]:
User_ID
414673119 True
Name: Datetime, dtype: bool
In [368]:
users[users].index
Out[368]:
Int64Index([414673119], dtype='int64')
In [361]:
df[df['User_ID'].isin(users[users].index)]
Out[361]:
User_ID Latitude Longitude Datetime
5 414673119 41.555014 2.096583 2014-02-24 20:15:30
6 414673119 41.555014 2.097583 2014-02-24 20:16:30
7 414673119 41.555014 2.098583 2014-02-24 20:17:30
You can then call to_csv on the above as normal
ANSWER 2
Score 1
first, make sure you have no duplicate entries:
df = df.drop_duplicates()
then, figure out the counts for each:
counts = df.groupby('User_ID').Datetime.count()
finally, figure out where the indexes overlap:
df[df.User_ID.isin(counts[counts > 1].index)]