The Python Oracle

How do I get the row count of a Pandas DataFrame?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping

--

Chapters
00:00 Question
00:18 Accepted answer (Score 2581)
01:10 Answer 2 (Score 461)
01:28 Answer 3 (Score 258)
02:09 Answer 4 (Score 151)
07:14 Thank you

--

Full question
https://stackoverflow.com/questions/1594...

Accepted answer links:
[number of non-NaN values]: https://pandas.pydata.org/docs/reference...
[image]: https://i.stack.imgur.com/wEzue.png

Answer 3 links:
[root's answer]: https://stackoverflow.com/questions/1594...

Answer 4 links:
[image]: https://i.stack.imgur.com/3FXuI.png
[DataFrame.count]: https://pandas.pydata.org/pandas-docs/st...
[Series.count]: https://pandas.pydata.org/pandas-docs/st...
[DataFrameGroupBy.size]: https://pandas.pydata.org/pandas-docs/st...
[SeriesGroupBy.size]: https://pandas.pydata.org/pandas-docs/st...
[GroupBy.count]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 2956


For a dataframe df, one can use any of the following:

Performance plot


Code to reproduce the plot:

import numpy as np
import pandas as pd
import perfplot

perfplot.save(
    "out.png",
    setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
    n_range=[2**k for k in range(25)],
    kernels=[
        lambda df: len(df.index),
        lambda df: df.shape[0],
        lambda df: df[df.columns[0]].count(),
    ],
    labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
    xlabel="Number of rows",
)



ANSWER 2

Score 494


Suppose df is your dataframe then:

count_row = df.shape[0]  # Gives number of rows
count_col = df.shape[1]  # Gives number of columns

Or, more succinctly,

r, c = df.shape



ANSWER 3

Score 283


Use len(df) :-).

__len__() is documented with "Returns length of index".

Timing info, set up the same way as in root's answer:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call, it is of course correct to say that it is a bit slower than calling len(df.index) directly. But this should not matter in most cases. I find len(df) to be quite readable.




ANSWER 4

Score 83


TL;DR use len(df)

len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df). For more about len function, see the official page.


Alternatively, you can access all rows and all columns with df.index, and df.columns,respectively. Since you can use the len(anyList) for getting the element numbers, using the len(df.index) will give the number of rows, and len(df.columns) will give the number of columns.

Or, you can use df.shape which returns the number of rows and columns together (as a tuple) where you can access each item with its index. If you want to access the number of rows, only use df.shape[0]. For the number of columns, only use: df.shape[1].