Rolling operations on DataFrameGroupby object
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Rolling Operations On Dataframegroupby Object
01:36 Accepted Answer Score 0
02:32 Thank you
--
Full question
https://stackoverflow.com/questions/5862...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #pandasgroupby
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Rolling Operations On Dataframegroupby Object
01:36 Accepted Answer Score 0
02:32 Thank you
--
Full question
https://stackoverflow.com/questions/5862...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #pandasgroupby
#avk47
ACCEPTED ANSWER
Score 0
I have found a workable solution but it only works if for each id each date is unique. This is the case in my data with some additional processing:
new_df = df.groupby(['id','date']).mean().reset_index()
which returns:
id date target
0 1.0 2017-01-01 0
1 1.0 2017-01-21 1
2 1.0 2017-10-01 0
3 2.0 2017-01-01 1
4 2.0 2017-01-21 0
5 2.0 2017-10-01 0
I can then use the rolling method on a groupby object to get the desired result:
df = new_df.set_index('date')
df.iloc[::-1].groupby('id')['target'].rolling(window='180D',
centre=False).apply(lambda x : x[:-1].sum())
There are two tricks here:
I reverse the order of the dates (
.iloc[::-1]) to take a forward looking window; this has been suggested in other SO questions.I drop the last entry of the sum to remove the 'current' date from the sum, so it only looks forward.
The second 'hack' means it only works when there are no repeats of dates for a given id.
I would be interested in making a more robust solution (e.g., where dates are repeated for an id).