The Python Oracle

Pandas: Setting True to False in a column, if it appears less than n times in a row

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence

--

Chapters
00:00 Pandas: Setting True To False In A Column, If It Appears Less Than N Times In A Row
01:22 Accepted Answer Score 2
02:41 Answer 2 Score 1
03:14 Thank you

--

Full question
https://stackoverflow.com/questions/6328...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 2


Idea is create groups for each consecutive Trues values by Series.cumsum with inverted boolean mask, then replace non match values to NaNs by Series.where and last count values of each groups by Series.map and Series.value_counts compared by threshold for greater by Series.gt:

s = (~df['input']).cumsum().where(df['input'])

df['out'] = s.map(s.value_counts()).gt(4)
print (df)
    input  output    out
0   False   False  False
1   False   False  False
2   False   False  False
3   False   False  False
4    True   False  False
5    True   False  False
6   False   False  False
7   False   False  False
8    True   False  False
9   False   False  False
10  False   False  False
11  False   False  False
12   True   False  False
13   True   False  False
14   True   False  False
15  False   False  False
16  False   False  False
17  False   False  False
18   True    True   True
19   True    True   True
20   True    True   True
21   True    True   True
22   True    True   True
23  False   False  False

Details:

s = (~df['input']).cumsum().where(df['input'])
print (df.assign(inv = (~df['input']),
                 cumsum = (~df['input']).cumsum(),
                 s = (~df['input']).cumsum().where(df['input']),
                 count = s.map(s.value_counts()),
                 out = s.map(s.value_counts()).gt(4)))
       
    input  output    inv  cumsum     s  count    out
0   False   False   True       1   NaN    NaN  False
1   False   False   True       2   NaN    NaN  False
2   False   False   True       3   NaN    NaN  False
3   False   False   True       4   NaN    NaN  False
4    True   False  False       4   4.0    2.0  False
5    True   False  False       4   4.0    2.0  False
6   False   False   True       5   NaN    NaN  False
7   False   False   True       6   NaN    NaN  False
8    True   False  False       6   6.0    1.0  False
9   False   False   True       7   NaN    NaN  False
10  False   False   True       8   NaN    NaN  False
11  False   False   True       9   NaN    NaN  False
12   True   False  False       9   9.0    3.0  False
13   True   False  False       9   9.0    3.0  False
14   True   False  False       9   9.0    3.0  False
15  False   False   True      10   NaN    NaN  False
16  False   False   True      11   NaN    NaN  False
17  False   False   True      12   NaN    NaN  False
18   True    True  False      12  12.0    5.0   True
19   True    True  False      12  12.0    5.0   True
20   True    True  False      12  12.0    5.0   True
21   True    True  False      12  12.0    5.0   True
22   True    True  False      12  12.0    5.0   True
23  False   False   True      13   NaN    NaN  False



ANSWER 2

Score 1


Here's a way to do that:

N = 4 

df["group_size"] = df.assign(group = (df.input==False).cumsum()).groupby("group").transform("count")
df.loc[(df.group_size > N) & df.input, "output"] = True
df.output.fillna(False, inplace = True)

The output is (note that the group size is always the actual size + 1) - but the final result is fine:

    input  group_size  output
0   False           1   False
1   False           1   False
2   False           1   False
3   False           3   False
4    True           3   False
5    True           3   False
6   False           1   False
7   False           2   False
8    True           2   False
9   False           1   False
10  False           1   False
11  False           4   False
12   True           4   False
13   True           4   False
14   True           4   False
15  False           1   False
16  False           1   False
17  False           6   False
18   True           6    True
19   True           6    True
20   True           6    True
21   True           6    True
22   True           6    True
23  False           1   False