Pandas: Setting True to False in a column, if it appears less than n times in a row
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5
--
Chapters
00:00 Pandas: Setting True To False In A Column, If It Appears Less Than N Times In A Row
01:30 Accepted Answer Score 2
02:45 Answer 2 Score 1
03:26 Thank you
--
Full question
https://stackoverflow.com/questions/6328...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 5
--
Chapters
00:00 Pandas: Setting True To False In A Column, If It Appears Less Than N Times In A Row
01:30 Accepted Answer Score 2
02:45 Answer 2 Score 1
03:26 Thank you
--
Full question
https://stackoverflow.com/questions/6328...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 2
Idea is create groups for each consecutive Trues values by Series.cumsum with inverted boolean mask, then replace non match values to NaNs by Series.where and last count values of each groups by Series.map and Series.value_counts compared by threshold for greater by Series.gt:
s = (~df['input']).cumsum().where(df['input'])
df['out'] = s.map(s.value_counts()).gt(4)
print (df)
input output out
0 False False False
1 False False False
2 False False False
3 False False False
4 True False False
5 True False False
6 False False False
7 False False False
8 True False False
9 False False False
10 False False False
11 False False False
12 True False False
13 True False False
14 True False False
15 False False False
16 False False False
17 False False False
18 True True True
19 True True True
20 True True True
21 True True True
22 True True True
23 False False False
Details:
s = (~df['input']).cumsum().where(df['input'])
print (df.assign(inv = (~df['input']),
cumsum = (~df['input']).cumsum(),
s = (~df['input']).cumsum().where(df['input']),
count = s.map(s.value_counts()),
out = s.map(s.value_counts()).gt(4)))
input output inv cumsum s count out
0 False False True 1 NaN NaN False
1 False False True 2 NaN NaN False
2 False False True 3 NaN NaN False
3 False False True 4 NaN NaN False
4 True False False 4 4.0 2.0 False
5 True False False 4 4.0 2.0 False
6 False False True 5 NaN NaN False
7 False False True 6 NaN NaN False
8 True False False 6 6.0 1.0 False
9 False False True 7 NaN NaN False
10 False False True 8 NaN NaN False
11 False False True 9 NaN NaN False
12 True False False 9 9.0 3.0 False
13 True False False 9 9.0 3.0 False
14 True False False 9 9.0 3.0 False
15 False False True 10 NaN NaN False
16 False False True 11 NaN NaN False
17 False False True 12 NaN NaN False
18 True True False 12 12.0 5.0 True
19 True True False 12 12.0 5.0 True
20 True True False 12 12.0 5.0 True
21 True True False 12 12.0 5.0 True
22 True True False 12 12.0 5.0 True
23 False False True 13 NaN NaN False
ANSWER 2
Score 1
Here's a way to do that:
N = 4
df["group_size"] = df.assign(group = (df.input==False).cumsum()).groupby("group").transform("count")
df.loc[(df.group_size > N) & df.input, "output"] = True
df.output.fillna(False, inplace = True)
The output is (note that the group size is always the actual size + 1) - but the final result is fine:
input group_size output
0 False 1 False
1 False 1 False
2 False 1 False
3 False 3 False
4 True 3 False
5 True 3 False
6 False 1 False
7 False 2 False
8 True 2 False
9 False 1 False
10 False 1 False
11 False 4 False
12 True 4 False
13 True 4 False
14 True 4 False
15 False 1 False
16 False 1 False
17 False 6 False
18 True 6 True
19 True 6 True
20 True 6 True
21 True 6 True
22 True 6 True
23 False 1 False