The Python Oracle

Python Pandas: Is Order Preserved When Using groupby() and agg()?

This video explains
Python Pandas: Is Order Preserved When Using groupby() and agg()?

--

Become part of the top 3% of the developers by applying to Toptal
https://topt.al/25cXVn

--

Track title: CC F Haydns String Quartet No 53 in D

--

Chapters
00:00 Question
01:28 Accepted answer (Score 36)
02:31 Answer 2 (Score 30)
02:57 Answer 3 (Score 25)
03:17 Answer 4 (Score 5)
03:52 Thank you

--

Full question
https://stackoverflow.com/questions/2645...

Accepted answer links:
[issue]: https://github.com/pydata/pandas/issues/...

Answer 3 links:
[http://pandas.pydata.org/pandas-docs/sta...]: http://pandas.pydata.org/pandas-docs/sta...

Answer 4 links:
[https://pandas.pydata.org/pandas-docs/st...]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #aggregate

#avk47



ANSWER 1

Score 53


In order to preserve order, you'll need to pass .groupby(..., sort=False). In your case the grouping column is already sorted, so it does not make difference, but generally one must use the sort=False flag:

 df.groupby('A', sort=False).agg([np.mean, lambda x: x.iloc[1] ])



ACCEPTED ANSWER

Score 42


See this enhancement issue

The short answer is yes, the groupby will preserve the orderings as passed in. You can prove this by using your example like this:

In [20]: df.sort_index(ascending=False).groupby('A').agg([np.mean, lambda x: x.iloc[1] ])
Out[20]: 
           B             C         
        mean <lambda> mean <lambda>
A                                  
group1  11.0       10  101      100
group2  17.5       10  175      100
group3  11.0       10  101      100

This is NOT true for resample however as it requires a monotonic index (it WILL work with a non-monotonic index, but will sort it first).

Their is a sort= flag to groupby, but this relates to the sorting of the groups themselves and not the observations within a group.

FYI: df.groupby('A').nth(1) is a safe way to get the 2nd value of a group (as your method above will fail if a group has < 2 elements)




ANSWER 3

Score 33


Panda's groupby documentation says

Groupby preserves the order of rows within each group

so this is "guaranteed" behavior.




ANSWER 4

Score 5


Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

The API accepts "SORT" as an argument.

Description for SORT argument is like this:

sort : bool, default True Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

Thus, it is clear the "Groupby" does preserve the order of rows within each group.