Rolling operations on DataFrameGroupby object
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Mysterious Puzzle
--
Chapters
00:00 Question
02:17 Accepted answer (Score 0)
03:27 Thank you
--
Full question
https://stackoverflow.com/questions/5862...
Accepted answer links:
[SO questions]: https://stackoverflow.com/questions/2282...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #pandasgroupby
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Mysterious Puzzle
--
Chapters
00:00 Question
02:17 Accepted answer (Score 0)
03:27 Thank you
--
Full question
https://stackoverflow.com/questions/5862...
Accepted answer links:
[SO questions]: https://stackoverflow.com/questions/2282...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #pandasgroupby
#avk47
ACCEPTED ANSWER
Score 0
I have found a workable solution but it only works if for each id each date is unique. This is the case in my data with some additional processing:
new_df = df.groupby(['id','date']).mean().reset_index()
which returns:
id date target
0 1.0 2017-01-01 0
1 1.0 2017-01-21 1
2 1.0 2017-10-01 0
3 2.0 2017-01-01 1
4 2.0 2017-01-21 0
5 2.0 2017-10-01 0
I can then use the rolling method on a groupby object to get the desired result:
df = new_df.set_index('date')
df.iloc[::-1].groupby('id')['target'].rolling(window='180D',
centre=False).apply(lambda x : x[:-1].sum())
There are two tricks here:
I reverse the order of the dates (
.iloc[::-1]) to take a forward looking window; this has been suggested in other SO questions.I drop the last entry of the sum to remove the 'current' date from the sum, so it only looks forward.
The second 'hack' means it only works when there are no repeats of dates for a given id.
I would be interested in making a more robust solution (e.g., where dates are repeated for an id).