New variable in a panda dataframe which counts consecutive values
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: The World Wide Mind
--
Chapters
00:00 New Variable In A Panda Dataframe Which Counts Consecutive Values
00:45 Accepted Answer Score 1
01:07 Answer 2 Score 4
01:39 Answer 3 Score 0
01:54 Thank you
--
Full question
https://stackoverflow.com/questions/5574...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: The World Wide Mind
--
Chapters
00:00 New Variable In A Panda Dataframe Which Counts Consecutive Values
00:45 Accepted Answer Score 1
01:07 Answer 2 Score 4
01:39 Answer 3 Score 0
01:54 Thank you
--
Full question
https://stackoverflow.com/questions/5574...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ANSWER 1
Score 4
Use vectorized solution with replace T values:
a = df['outrange'] == 'T'
#if values are boolean True
#a = df['outrange']
b = a.cumsum()
df['count_new'] = b-b.where(~a).ffill().fillna(0).astype(int)
print (df)
f outrange count count_new
0 1 F 0 0
1 2 F 0 0
2 3 T 1 1
3 4 T 2 2
4 1 F 0 0
5 3 T 1 1
6 2 F 0 0
7 3 T 1 1
8 4 T 2 2
Performance:
#90k
df = pd.concat([df] * 10000, ignore_index=True)
In [183]: %%timeit
...: a = df['outrange'] == 'T'
...: b = a.cumsum()
...: df['count_new'] = b-b.where(~a).ffill().fillna(0).astype(int)
...:
7.83 ms ± 28.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [184]: %%timeit
...: df['B'] = df['outrange'].apply(conditional_cumulative_sum)
...:
35.9 ms ± 295 µs p
ACCEPTED ANSWER
Score 1
I have used my own data as the input. Column 'A' corresponds to your 'outrange' column and 'B' corresponds to 'count' column. Please test with your actual data.
import numpy as np
import pandas as pd
df = pd.DataFrame(data{'A':['F','T','T','T','F','F','F','T','F','F','F','T','T','F','F','F']})
counter = 0
def conditional_cumulative_sum(x):
global counter
if x == 'F':
counter = 0
return counter
elif x == 'T':
counter += 1
return counter
df['B'] = df['A'].apply(conditional_cumulative_sum)
ANSWER 3
Score 0
Perhaps easier with a dictionary.
arr = np.array([0,0,1,1,0,1,0,1,1]) #your array of booleans
dct = {k:0 if arr[k]<1 else 1 for k in range(len(arr))}
c = 0
for i in arr:
if i > 0 and dct[c-1]>0:
dct[c] = 1 + dct[c-1]
c = c+1
pd.DataFrame([arr,dct.values()],index=["outofrange","count"]).T
#outofrange count
0 0
0 0
1 1
1 2
0 0
1 1
0 0
1 1
1 2