Pandas dataframe get first row of each group
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Light Drops
--
Chapters
00:00 Question
01:04 Accepted answer (Score 394)
01:41 Answer 2 (Score 94)
02:01 Answer 3 (Score 88)
02:54 Answer 4 (Score 19)
03:16 Thank you
--
Full question
https://stackoverflow.com/questions/2006...
Answer 1 links:
http://pandas.pydata.org/pandas-docs/sta...
Answer 3 links:
[drop_duplicates]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #groupby #row
#avk47
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Light Drops
--
Chapters
00:00 Question
01:04 Accepted answer (Score 394)
01:41 Answer 2 (Score 94)
02:01 Answer 3 (Score 88)
02:54 Answer 4 (Score 19)
03:16 Thank you
--
Full question
https://stackoverflow.com/questions/2006...
Answer 1 links:
http://pandas.pydata.org/pandas-docs/sta...
Answer 3 links:
[drop_duplicates]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe #groupby #row
#avk47
ACCEPTED ANSWER
Score 434
>>> df.groupby('id').first()
value
id
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth
If you need id as column:
>>> df.groupby('id').first().reset_index()
id value
0 1 first
1 2 first
2 3 first
3 4 second
4 5 first
5 6 first
6 7 fourth
To get n first records, you can use head():
>>> df.groupby('id').head(2).reset_index(drop=True)
id value
0 1 first
1 1 second
2 2 first
3 2 second
4 3 first
5 3 third
6 4 second
7 4 fifth
8 5 first
9 6 first
10 6 second
11 7 fourth
12 7 fifth
ANSWER 2
Score 103
I'd suggest to use .nth(0) rather than .first() if you need to get the first row.
The difference between them is how they handle NaNs, so .nth(0) will return the first row of group no matter what are the values in this row, while .first() will eventually return the first not NaN value in each column.
E.g. if your dataset is :
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
'value' : ["first","second","third", np.NaN,
"second","first","second","third",
"fourth","first","second"]})
>>> df.groupby('id').nth(0)
value
id
1 first
2 NaN
3 first
4 first
And
>>> df.groupby('id').first()
value
id
1 first
2 second
3 first
4 first
ANSWER 3
Score 99
This will give you the second row of each group (zero indexed, nth(0) is the same as first()):
df.groupby('id').nth(1)
Documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group
ANSWER 4
Score 10
maybe this is what you want
import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'], ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)
pop state1 county1 12 county2 15 county3 65 county4 42 state2 county1 78 county2 67 county3 55 county4 31
df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)
> Out[29]:
pop
state1 county3 65
county4 42
county2 15
state2 county1 78
county2 67
county3 55