Python Pandas: Is Order Preserved When Using groupby() and agg()?
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over a Mysterious Island Looping
--
Chapters
00:00 Python Pandas: Is Order Preserved When Using Groupby() And Agg()?
01:15 Accepted Answer Score 42
02:04 Answer 2 Score 33
02:17 Answer 3 Score 53
02:36 Answer 4 Score 5
03:07 Thank you
--
Full question
https://stackoverflow.com/questions/2645...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #aggregate
#avk47
ANSWER 1
Score 53
In order to preserve order, you'll need to pass .groupby(..., sort=False). In your case the grouping column is already sorted, so it does not make difference, but generally one must use the sort=False flag:
df.groupby('A', sort=False).agg([np.mean, lambda x: x.iloc[1] ])
ACCEPTED ANSWER
Score 42
See this enhancement issue
The short answer is yes, the groupby will preserve the orderings as passed in. You can prove this by using your example like this:
In [20]: df.sort_index(ascending=False).groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
Out[20]:
B C
mean <lambda> mean <lambda>
A
group1 11.0 10 101 100
group2 17.5 10 175 100
group3 11.0 10 101 100
This is NOT true for resample however as it requires a monotonic index (it WILL work with a non-monotonic index, but will sort it first).
Their is a sort= flag to groupby, but this relates to the sorting of the groups themselves and not the observations within a group.
FYI: df.groupby('A').nth(1) is a safe way to get the 2nd value of a group (as your method above will fail if a group has < 2 elements)
ANSWER 3
Score 33
Panda's groupby documentation says
Groupby preserves the order of rows within each group
so this is "guaranteed" behavior.
ANSWER 4
Score 5
Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
The API accepts "SORT" as an argument.
Description for SORT argument is like this:
sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
Thus, it is clear the "Groupby" does preserve the order of rows within each group.