The Python Oracle

Pandas dataframe get first row of each group

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Light Drops

--

Chapters
00:00 Question
01:04 Accepted answer (Score 394)
01:41 Answer 2 (Score 94)
02:01 Answer 3 (Score 88)
02:54 Answer 4 (Score 19)
03:16 Thank you

--

Full question
https://stackoverflow.com/questions/2006...

Answer 1 links:
http://pandas.pydata.org/pandas-docs/sta...

Answer 3 links:
[drop_duplicates]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe #groupby #row

#avk47



ACCEPTED ANSWER

Score 434


>>> df.groupby('id').first()
     value
id        
1    first
2    first
3    first
4   second
5    first
6    first
7   fourth

If you need id as column:

>>> df.groupby('id').first().reset_index()
   id   value
0   1   first
1   2   first
2   3   first
3   4  second
4   5   first
5   6   first
6   7  fourth

To get n first records, you can use head():

>>> df.groupby('id').head(2).reset_index(drop=True)
    id   value
0    1   first
1    1  second
2    2   first
3    2  second
4    3   first
5    3   third
6    4  second
7    4   fifth
8    5   first
9    6   first
10   6  second
11   7  fourth
12   7   fifth



ANSWER 2

Score 103


I'd suggest to use .nth(0) rather than .first() if you need to get the first row.

The difference between them is how they handle NaNs, so .nth(0) will return the first row of group no matter what are the values in this row, while .first() will eventually return the first not NaN value in each column.

E.g. if your dataset is :

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
            'value'  : ["first","second","third", np.NaN,
                        "second","first","second","third",
                        "fourth","first","second"]})

>>> df.groupby('id').nth(0)
    value
id        
1    first
2    NaN
3    first
4    first

And

>>> df.groupby('id').first()
    value
id        
1    first
2    second
3    first
4    first



ANSWER 3

Score 99


This will give you the second row of each group (zero indexed, nth(0) is the same as first()):

df.groupby('id').nth(1) 

Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group




ANSWER 4

Score 10


maybe this is what you want

import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'],   ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)
                pop
state1 county1   12
       county2   15
       county3   65
       county4   42
state2 county1   78
       county2   67
       county3   55
       county4   31
df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)

> Out[29]: 
                pop
state1 county3   65
       county4   42
       county2   15
state2 county1   78
       county2   67
       county3   55