pandas dataframe groupby datetime month
--
Music by Eric Matyas
https://www.soundimage.org
Track title: RPG Blues Looping
--
Chapters
00:00 Question
01:06 Accepted answer (Score 252)
01:24 Answer 2 (Score 113)
01:42 Answer 3 (Score 27)
02:04 Answer 4 (Score 18)
03:48 Thank you
--
Full question
https://stackoverflow.com/questions/2408...
Answer 2 links:
[resample]: https://pandas.pydata.org/pandas-docs/st...
[here]: https://pandas.pydata.org/pandas-docs/st...
Answer 3 links:
[pd.Grouper]: https://pandas.pydata.org/pandas-docs/ve...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #datetime #pandasgroupby
#avk47
ACCEPTED ANSWER
Score 268
Managed to do it:
b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])
Or
b.groupby(pd.Grouper(freq='M')) # update for v0.21+
ANSWER 2
Score 123
(update: 2018)
Note that pd.Timegrouper is depreciated and will be removed. Use instead:
df.groupby(pd.Grouper(freq='M'))
ANSWER 3
Score 18
One solution which avoids MultiIndex is to create a new datetime column setting day = 1. Then group by this column.
Normalise day of month
df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20', '2017-10-01', '2017-09-01']),
'Values': [5, 10, 15, 20]})
# normalize day to beginning of month, 4 alternative methods below
df['YearMonth'] = df['Date'] + pd.offsets.MonthEnd(-1) + pd.offsets.Day(1)
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))
df['YearMonth'] = df['Date'].dt.normalize().map(pd.tseries.offsets.MonthBegin().rollback)
Then use groupby as normal:
g = df.groupby('YearMonth')
res = g['Values'].sum()
# YearMonth
# 2017-09-01 20
# 2017-10-01 30
# Name: Values, dtype: int64
Comparison with pd.Grouper
The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group:
some_group = g.get_group('2017-10-01')
Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a convention parameter, but this is only applicable for a PeriodIndex grouper.
Comparison with string conversion
An alternative to the above idea is to convert to a string, e.g. convert datetime 2017-10-XX to string '2017-10'. However, this is not recommended since you lose all the efficiency benefits of a datetime series (stored internally as numerical data in a contiguous memory block) versus an object series of strings (stored as an array of pointers).
ANSWER 4
Score 11
Slightly alternative solution to @jpp's but outputting a YearMonth string:
df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))
res = df.groupby('YearMonth')['Values'].sum()