The Python Oracle

New variable in a panda dataframe which counts consecutive values

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: The World Wide Mind

--

Chapters
00:00 New Variable In A Panda Dataframe Which Counts Consecutive Values
00:45 Accepted Answer Score 1
01:07 Answer 2 Score 4
01:39 Answer 3 Score 0
01:54 Thank you

--

Full question
https://stackoverflow.com/questions/5574...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ANSWER 1

Score 4


Use vectorized solution with replace T values:

a = df['outrange'] == 'T'
#if values are boolean True
#a = df['outrange']
b = a.cumsum()
df['count_new'] = b-b.where(~a).ffill().fillna(0).astype(int)
print (df)
   f outrange  count  count_new
0  1        F      0          0
1  2        F      0          0
2  3        T      1          1
3  4        T      2          2
4  1        F      0          0
5  3        T      1          1
6  2        F      0          0
7  3        T      1          1
8  4        T      2          2

Performance:

#90k
df = pd.concat([df] * 10000, ignore_index=True)

In [183]: %%timeit
     ...: a = df['outrange'] == 'T'
     ...: b = a.cumsum()
     ...: df['count_new'] = b-b.where(~a).ffill().fillna(0).astype(int)
     ...: 
7.83 ms ± 28.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [184]: %%timeit
     ...: df['B'] = df['outrange'].apply(conditional_cumulative_sum)
     ...: 
35.9 ms ± 295 µs p



ACCEPTED ANSWER

Score 1


I have used my own data as the input. Column 'A' corresponds to your 'outrange' column and 'B' corresponds to 'count' column. Please test with your actual data.

import numpy as np 
import pandas as pd

df = pd.DataFrame(data{'A':['F','T','T','T','F','F','F','T','F','F','F','T','T','F','F','F']})
counter = 0

def conditional_cumulative_sum(x):
    global counter

    if x == 'F':
        counter = 0
        return counter
    elif x == 'T':
        counter += 1
        return counter
df['B'] = df['A'].apply(conditional_cumulative_sum)



ANSWER 3

Score 0


Perhaps easier with a dictionary.

arr = np.array([0,0,1,1,0,1,0,1,1]) #your array of booleans

dct = {k:0 if arr[k]<1 else 1 for k in range(len(arr))}

c = 0
for i in arr:
  if i > 0 and dct[c-1]>0:
     dct[c] = 1 + dct[c-1]
  c = c+1

pd.DataFrame([arr,dct.values()],index=["outofrange","count"]).T

#outofrange count
0           0
0           0
1           1 
1           2
0           0
1           1 
0           0
1           1
1           2