The Python Oracle

How to get the last N rows of a pandas DataFrame?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Dream Voyager Looping

--

Chapters
00:00 How To Get The Last N Rows Of A Pandas Dataframe?
01:24 Answer 1 Score 112
01:52 Accepted Answer Score 561
02:04 Answer 3 Score 14
02:47 Answer 4 Score 0
04:03 Thank you

--

Full question
https://stackoverflow.com/questions/1466...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 561


Don't forget DataFrame.tail! e.g. df1.tail(10)




ANSWER 2

Score 112


This is because of using integer indices (ix selects those by label over -3 rather than position, and this is by design: see integer indexing in pandas "gotchas"*).

*In newer versions of pandas prefer loc or iloc to remove the ambiguity of ix as position or label:

df.iloc[-3:]

see the docs.

As Wes points out, in this specific case you should just use tail!




ANSWER 3

Score 14


How to get the last N rows of a pandas DataFrame?

If you are slicing by position, __getitem__ (i.e., slicing with[]) works well, and is the most succinct solution I've found for this problem.

pd.__version__
# '0.24.2'

df = pd.DataFrame({'A': list('aaabbbbc'), 'B': np.arange(1, 9)})
df

   A  B
0  a  1
1  a  2
2  a  3
3  b  4
4  b  5
5  b  6
6  b  7
7  c  8

df[-3:]

   A  B
5  b  6
6  b  7
7  c  8

This is the same as calling df.iloc[-3:], for instance (iloc internally delegates to __getitem__).


As an aside, if you want to find the last N rows for each group, use groupby and GroupBy.tail:

df.groupby('A').tail(2)

   A  B
1  a  2
2  a  3
5  b  6
6  b  7
7  c  8



ANSWER 4

Score 0


The top two answers suggest that there may be 2 ways to get the same output but if you look at the source code, .tail(n) is a syntactic sugar for .iloc[-n:]. For the task of getting the last n rows as in the title, they are exactly the same.

However, they are different if we want to get the last n rows of a group because there is no groupby.iloc but there is groupby.tail (and groupby.nth).


Another, slightly obscure, way to get the last rows is via take() which is similar to numpy.take; however, we have to pass a list of indices: df.take(range(-n, 0)).

The main difference of this from tail/iloc is related to error handling. If the dataframe has less than n rows, but we try to get the last n rows, tail/iloc returns the entire dataframe while take raises an error. This comes in handy if you're making calls to an API or webscraping etc. where you expect the dataframe to have a certain shape but something fails unexpectedly; tail may silently produce a wrong output while take can alert you.

df = pd.DataFrame({'a': [1, 2]})
df.tail(5)             # <--- entire dataframe
df.iloc[-5:]           # <--- entire dataframe
df.take(range(-5,0))   # <--- IndexError: indices are out-of-bounds