Iterating over rows in pandas to check the condition

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Magic Ocean Looping

--

Chapters
00:00 Iterating Over Rows In Pandas To Check The Condition
00:39 Answer 1 Score 5
01:04 Answer 2 Score 5
01:18 Answer 3 Score 5
01:35 Accepted Answer Score 4
02:03 Answer 5 Score 3
03:12 Thank you

--

Full question
https://stackoverflow.com/questions/5204...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47

ANSWER 1

Score 5

Using `idxmax` with `loc` for assignment

idx = df.Col_A.eq(0).idxmax()
df['Col_B'] = False
df.loc[idx:, 'Col_B'] = True

   Col_A  Col_B
0   1234  False
1   6267  False
2   6364  False
3    573  False
4      0   True
5    838   True
6     92   True
7   3221   True

Using `assign`:

This approach avoids modifying the original DataFrame.

df.assign(Col_B=(df.index >= idx))

ANSWER 2

Score 5

Using eq with cummax

df.A.eq(0).cummax()
Out[5]: 
0    False
1    False
2    False
3    False
4     True
5     True
6     True
7     True
Name: A, dtype: bool

ANSWER 3

Score 5

You can use Numpy's accumulate method of the ufunc logical_or

df.assign(Col_B=np.logical_or.accumulate(df.Col_A.values == 0))

   Col_A  Col_B
0   1234  False
1   6267  False
2   6364  False
3    573  False
4      0   True
5    838   True
6     92   True
7   3221   True

ACCEPTED ANSWER

Score 4

You can use next with a generator expression. This will be more efficient in the case of a large series where 0 appears near the beginning.

@user3483203's NumPy-based solution should be fine for general use.

df = pd.DataFrame({'A': [1234, 6267, 6364, 573, 0, 838, 92, 3221]})

idx = next((i for i, j in enumerate(df['A']) if j == 0), len(df['A']))

df['B'] = ~(df.index < idx)

# more verbose alternative:
# df['B'] = np.where(df.index < idx, False, True)

print(df)

      A      B
0  1234  False
1  6267  False
2  6364  False
3   573  False
4     0   True
5   838   True
6    92   True
7  3221   True

ANSWER 1

Score 5

Using idxmax with loc for assignment

Using assign:

ANSWER 2

Score 5

ANSWER 3

Score 5

ACCEPTED ANSWER

Score 4

Using `idxmax` with `loc` for assignment

Using `assign`: