How do I get the row count of a Pandas DataFrame?
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Switch On Looping
--
Chapters
00:00 Question
00:18 Accepted answer (Score 2581)
01:10 Answer 2 (Score 461)
01:28 Answer 3 (Score 258)
02:09 Answer 4 (Score 151)
07:14 Thank you
--
Full question
https://stackoverflow.com/questions/1594...
Accepted answer links:
[number of non-NaN values]: https://pandas.pydata.org/docs/reference...
[image]: https://i.stack.imgur.com/wEzue.png
Answer 3 links:
[root's answer]: https://stackoverflow.com/questions/1594...
Answer 4 links:
[image]: https://i.stack.imgur.com/3FXuI.png
[DataFrame.count]: https://pandas.pydata.org/pandas-docs/st...
[Series.count]: https://pandas.pydata.org/pandas-docs/st...
[DataFrameGroupBy.size]: https://pandas.pydata.org/pandas-docs/st...
[SeriesGroupBy.size]: https://pandas.pydata.org/pandas-docs/st...
[GroupBy.count]: https://pandas.pydata.org/pandas-docs/st...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 2956
For a dataframe df, one can use any of the following:
len(df.index)df.shape[0]df[df.columns[0]].count()(== number of non-NaN values in first column)
Code to reproduce the plot:
import numpy as np
import pandas as pd
import perfplot
perfplot.save(
"out.png",
setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)),
n_range=[2**k for k in range(25)],
kernels=[
lambda df: len(df.index),
lambda df: df.shape[0],
lambda df: df[df.columns[0]].count(),
],
labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"],
xlabel="Number of rows",
)
ANSWER 2
Score 494
Suppose df is your dataframe then:
count_row = df.shape[0] # Gives number of rows
count_col = df.shape[1] # Gives number of columns
Or, more succinctly,
r, c = df.shape
ANSWER 3
Score 283
Use len(df) :-).
__len__() is documented with "Returns length of index".
Timing info, set up the same way as in root's answer:
In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop
In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop
Due to one additional function call, it is of course correct to say that it is a bit slower than calling len(df.index) directly. But this should not matter in most cases. I find len(df) to be quite readable.
ANSWER 4
Score 83
TL;DR use len(df)
len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df).
For more about len function, see the official page.
Alternatively, you can access all rows and all columns with df.index, and df.columns,respectively. Since you can use the len(anyList) for getting the element numbers, using the
len(df.index) will give the number of rows, and len(df.columns) will give the number of columns.
Or, you can use df.shape which returns the number of rows and columns together (as a tuple) where you can access each item with its index. If you want to access the number of rows, only use df.shape[0]. For the number of columns, only use: df.shape[1].
