Use center in pandas rolling when using a time-series
Use center in pandas rolling when using a time-series
--
Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn
--
Track title: CC G Dvoks String Quartet No 12 Ame 2
--
Chapters
00:00 Question
01:37 Accepted answer (Score 3)
02:18 Answer 2 (Score 11)
03:41 Answer 3 (Score 1)
04:28 Thank you
--
Full question
https://stackoverflow.com/questions/4701...
Accepted answer links:
[the work is merged]: https://github.com/pandas-dev/pandas/pul...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #timeseries #rollingsum
#avk47
ANSWER 1
Score 11
Try the following (tested with pandas==0.23.3):
series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')
This will center your rolling sum in the 7-day window (by shifting -3.5 days), and will allow you to use a 'datetimelike' string for defining the window size. Note that shift() only takes an integer, thus defining with hours.
This will produce your desired output:
series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')['2014-01-01':].head(10)
2014-01-01 12:00:00 4.0
2014-01-02 12:00:00 5.0
2014-01-03 12:00:00 6.0
2014-01-04 12:00:00 7.0
2014-01-05 12:00:00 7.0
2014-01-06 12:00:00 7.0
2014-01-07 12:00:00 7.0
2014-01-08 12:00:00 7.0
2014-01-09 12:00:00 7.0
2014-01-10 12:00:00 7.0
Freq: D, dtype: float64
Note that the rolling sum is assigned to the center of the 7-day windows (using midnight to midnight timestamps), so the centered timestamp includes '12:00:00'.
Another option (as you show at the end of your question) is to resample the data to make sure it has even Datetime frequency, then use an integer for window size (window = 7) and center=True. However, you state that other parts of your code benefit from defining window with a 'datetimelike' string, so perhaps this option is not ideal.
ACCEPTED ANSWER
Score 4
From pandas version 1.3 this is * directly possible with pandas.
* Or will be (the work is merged, but 1.3 is not yet released as of today; I tested the lines below against the pandas main branch).
import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling(7, min_periods=1, center=True).sum().head(10)
Output is as expected:
2014-01-01 4.0
2014-01-02 5.0
2014-01-03 6.0
2014-01-04 7.0
2014-01-05 7.0
2014-01-06 7.0
2014-01-07 7.0
2014-01-08 7.0
2014-01-09 7.0
2014-01-10 7.0
Freq: D, dtype: float64
ANSWER 3
Score 1
You could try to resample your serie/dataframe in order to convert the offset window to a fixed width window.
# Parameters
window_timedelta = '7D'
resample_timedelta = '1D'
# Convert offset to window size
window_size = pd.Timedelta(structure_duration) // pd.Timedelta(resample_timedelta)
# Resample serie
series_res = series.resample(resample_timedelta, on='datetime').first()
# Perform the sum on the resampled serie
series_res['window_sum'] = series_res.rolling(window_size, center=True, min_periods=1).sum()
Note: the first hack in the resampling only works if you know that you have at maximum 1 pt/day. If you have more, you can replace it by sum or whatever is relevant to your data.
Note 2: the introduced NaN for missing dates will not cause the sum value to be NaN, Pandas ignores them while summing