The Python Oracle

Pandas: Setting True to False in a column, if it appears less than n times in a row

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: A Thousand Exotic Places Looping v001

--

Chapters
00:00 Question
02:00 Accepted answer (Score 2)
03:37 Answer 2 (Score 1)
04:25 Thank you

--

Full question
https://stackoverflow.com/questions/6328...

Accepted answer links:
[Series.cumsum]: http://pandas.pydata.org/pandas-docs/sta...
[Series.where]: http://pandas.pydata.org/pandas-docs/sta...
[Series.map]: http://pandas.pydata.org/pandas-docs/sta...
[Series.value_counts]: http://pandas.pydata.org/pandas-docs/sta...
[Series.gt]: http://pandas.pydata.org/pandas-docs/sta...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 2


Idea is create groups for each consecutive Trues values by Series.cumsum with inverted boolean mask, then replace non match values to NaNs by Series.where and last count values of each groups by Series.map and Series.value_counts compared by threshold for greater by Series.gt:

s = (~df['input']).cumsum().where(df['input'])

df['out'] = s.map(s.value_counts()).gt(4)
print (df)
    input  output    out
0   False   False  False
1   False   False  False
2   False   False  False
3   False   False  False
4    True   False  False
5    True   False  False
6   False   False  False
7   False   False  False
8    True   False  False
9   False   False  False
10  False   False  False
11  False   False  False
12   True   False  False
13   True   False  False
14   True   False  False
15  False   False  False
16  False   False  False
17  False   False  False
18   True    True   True
19   True    True   True
20   True    True   True
21   True    True   True
22   True    True   True
23  False   False  False

Details:

s = (~df['input']).cumsum().where(df['input'])
print (df.assign(inv = (~df['input']),
                 cumsum = (~df['input']).cumsum(),
                 s = (~df['input']).cumsum().where(df['input']),
                 count = s.map(s.value_counts()),
                 out = s.map(s.value_counts()).gt(4)))
       
    input  output    inv  cumsum     s  count    out
0   False   False   True       1   NaN    NaN  False
1   False   False   True       2   NaN    NaN  False
2   False   False   True       3   NaN    NaN  False
3   False   False   True       4   NaN    NaN  False
4    True   False  False       4   4.0    2.0  False
5    True   False  False       4   4.0    2.0  False
6   False   False   True       5   NaN    NaN  False
7   False   False   True       6   NaN    NaN  False
8    True   False  False       6   6.0    1.0  False
9   False   False   True       7   NaN    NaN  False
10  False   False   True       8   NaN    NaN  False
11  False   False   True       9   NaN    NaN  False
12   True   False  False       9   9.0    3.0  False
13   True   False  False       9   9.0    3.0  False
14   True   False  False       9   9.0    3.0  False
15  False   False   True      10   NaN    NaN  False
16  False   False   True      11   NaN    NaN  False
17  False   False   True      12   NaN    NaN  False
18   True    True  False      12  12.0    5.0   True
19   True    True  False      12  12.0    5.0   True
20   True    True  False      12  12.0    5.0   True
21   True    True  False      12  12.0    5.0   True
22   True    True  False      12  12.0    5.0   True
23  False   False   True      13   NaN    NaN  False



ANSWER 2

Score 1


Here's a way to do that:

N = 4 

df["group_size"] = df.assign(group = (df.input==False).cumsum()).groupby("group").transform("count")
df.loc[(df.group_size > N) & df.input, "output"] = True
df.output.fillna(False, inplace = True)

The output is (note that the group size is always the actual size + 1) - but the final result is fine:

    input  group_size  output
0   False           1   False
1   False           1   False
2   False           1   False
3   False           3   False
4    True           3   False
5    True           3   False
6   False           1   False
7   False           2   False
8    True           2   False
9   False           1   False
10  False           1   False
11  False           4   False
12   True           4   False
13   True           4   False
14   True           4   False
15  False           1   False
16  False           1   False
17  False           6   False
18   True           6    True
19   True           6    True
20   True           6    True
21   True           6    True
22   True           6    True
23  False           1   False