How to iterate over rows in a DataFrame in Pandas
--
Music by Eric Matyas
https://www.soundimage.org
Track title: Light Drops
--
Chapters
00:00 Question
00:51 Accepted answer (Score 4898)
01:12 Answer 2 (Score 2149)
11:09 Answer 3 (Score 554)
13:11 Answer 4 (Score 244)
13:29 Thank you
--
Full question
https://stackoverflow.com/questions/1647...
Question links:
[similar question]: https://stackoverflow.com/questions/7837...
Accepted answer links:
[DataFrame.iterrows]: https://pandas.pydata.org/pandas-docs/st...
Answer 2 links:
[DataFrame.to_string()]: https://pandas.pydata.org/pandas-docs/st...
[here]: https://stackoverflow.com/questions/2487...
[Cython]: https://en.wikipedia.org/wiki/Cython
[DataFrame.apply()]: https://pandas.pydata.org/pandas-docs/st...
[DataFrame.itertuples()]: https://pandas.pydata.org/pandas-docs/st...
[iteritems()]: https://pandas.pydata.org/pandas-docs/st...
[DataFrame.iterrows()]: https://pandas.pydata.org/pandas-docs/st...
[The documentation page]: https://pandas.pydata.org/pandas-docs/st...
[Vectorization]: https://stackoverflow.com/questions/1422...
[Cython]: https://cython.org
[Essential Basic Functionality]: https://pandas.pydata.org/pandas-docs/st...
[Cython extensions]: https://pandas.pydata.org/pandas-docs/st...
[List Comprehensions]: https://docs.python.org/3/tutorial/datas...
[good amount of evidence]: https://stackoverflow.com/questions/5402...
[Benchmarking code, for your reference]: https://gist.github.com/Coldsp33d/948f96...
[this post of mine]: https://stackoverflow.com/questions/5443...
[10 Minutes to pandas]: https://pandas.pydata.org/pandas-docs/st...
[Essential Basic Functionality]: https://pandas.pydata.org/pandas-docs/st...
[Enhancing Performance]: https://pandas.pydata.org/pandas-docs/st...
[Are for-loops in pandas really bad? When should I care?]: https://stackoverflow.com/questions/5402...
[When should I (not) want to use pandas apply() in my code?]: https://stackoverflow.com/questions/5443...
Answer 3 links:
[this answer]: https://stackoverflow.com/a/55557758/384...
[DataFrame.iterrows()]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.itertuples()]: http://pandas.pydata.org/pandas-docs/sta...
[DataFrame.apply()]: http://pandas.pydata.org/pandas-docs/sta...
[pandas docs on iteration]: https://pandas.pydata.org/docs/user_guid...
Answer 4 links:
[df.iterrows()]: http://pandas.pydata.org/pandas-docs/sta...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 5449
DataFrame.iterrows is a generator which yields both the index and row (as a Series):
import pandas as pd
df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})
df = df.reset_index() # make sure indexes pair with number of rows
for index, row in df.iterrows():
print(row['c1'], row['c2'])
10 100
11 110
12 120
Obligatory disclaimer from the documentation
Iterating through pandas objects is generally slow. In many cases, iterating manually over the rows is not needed and can be avoided with one of the following approaches:
- Look for a vectorized solution: many operations can be performed using built-in methods or NumPy functions, (boolean) indexing, …
- When you have a function that cannot work on the full DataFrame/Series at once, it is better to use
apply()instead of iterating over the values. See the docs on function application.- If you need to do iterative manipulations on the values but performance is important, consider writing the inner loop with cython or numba. See the enhancing performance section for some examples of this approach.
Other answers in this thread delve into greater depth on alternatives to iter* functions if you are interested to learn more.
ANSWER 2
Score 578
First consider if you really need to iterate over rows in a DataFrame. See cs95's answer for alternatives.
If you still need to iterate over rows, you can use methods below. Note some important caveats which are not mentioned in any of the other answers.
-
for index, row in df.iterrows(): print(row["c1"], row["c2"]) -
for row in df.itertuples(index=True, name='Pandas'): print(row.c1, row.c2)
itertuples() is supposed to be faster than iterrows()
But be aware, according to the docs (pandas 0.24.2 at the moment):
- iterrows:
dtypemight not match from row to rowBecause iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames). To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values and which is generally much faster than iterrows()
iterrows: Do not modify rows
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
Use DataFrame.apply() instead:
new_df = df.apply(lambda x: x * 2, axis=1)itertuples:
The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.
See pandas docs on iteration for more details.
ANSWER 3
Score 255
You should use df.iterrows(). Though iterating row-by-row is not especially efficient since Series objects have to be created.
ANSWER 4
Score 189
While iterrows() is a good option, sometimes itertuples() can be much faster:
df = pd.DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})
%timeit [row.a * 2 for idx, row in df.iterrows()]
# => 10 loops, best of 3: 50.3 ms per loop
%timeit [row[1] * 2 for row in df.itertuples()]
# => 1000 loops, best of 3: 541 µs per loop