How to create incrementing group column counter
Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn
--
Track title: CC H Dvoks String Quartet No 12 Ame
--
Chapters
00:00 Question
01:07 Accepted answer (Score 2)
01:48 Answer 2 (Score 1)
02:22 Answer 3 (Score 1)
02:53 Thank you
--
Full question
https://stackoverflow.com/questions/7419...
Accepted answer links:
[diff]: https://pandas.pydata.org/docs/reference...
Answer 2 links:
[mozway]: https://stackoverflow.com/users/16343464...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
--
Track title: CC H Dvoks String Quartet No 12 Ame
--
Chapters
00:00 Question
01:07 Accepted answer (Score 2)
01:48 Answer 2 (Score 1)
02:22 Answer 3 (Score 1)
02:53 Thank you
--
Full question
https://stackoverflow.com/questions/7419...
Accepted answer links:
[diff]: https://pandas.pydata.org/docs/reference...
Answer 2 links:
[mozway]: https://stackoverflow.com/users/16343464...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 2
You can use diff to select only the first item of each stretch of True:
df['ExpectedGroup'] = (df['case_id'].diff()
&df['case_id']
).cumsum().where(df['case_id'])
If you don't want the intermediate column:
s = (df.FromState == 'O') & (df.ToState == 'O')
# or
# s = df[['FromState', 'ToState']].eq('O').all(axis=1)
df['ExpectedGroup'] = (s.diff()&s).cumsum().where(s)
# or
# df.loc[s, 'ExpectedGroup'] = (s.diff()&s).cumsum()
Output:
ID FromState ToState Hours ExpectedGroup case_id
0 A P O 2 NaN False
1 A O O 5 1.0 True
2 A O O 10 1.0 True
3 A O P 4 NaN False
4 A P P 300 NaN False
5 B P O 2 NaN False
6 B O O 5 2.0 True
7 B O O 10 2.0 True
8 B O P 4 NaN False
9 B P P 300 NaN False
ANSWER 2
Score 1
Let's use cumsum to create counter then reencode the counter using factorize
m = df['case_id']
df.loc[m, 'ExpectedGroup'] = (~m).cumsum()[m].factorize()[0] + 1
ID FromState ToState Hours ExpectedGroup case_id
0 A P O 2 NaN False
1 A O O 5 1.0 True
2 A O O 10 1.0 True
3 A O P 4 NaN False
4 A P P 300 NaN False
5 A P O 2 NaN False
6 A O O 5 2.0 True
7 A O O 10 2.0 True
8 A O P 4 NaN False
9 A P P 300 NaN False
10 B P O 2 NaN False
11 B O O 5 3.0 True
12 B O O 10 3.0 True
13 B O P 4 NaN False
14 B P P 300 NaN False
ANSWER 3
Score 1
Similar to mozway's brilliant approach:
df['ExpectedGroup'] = (df['case_id'].shift(-1) & df['case_id']).cumsum().mask(~s)
df
ID FromState ToState Hours ExpectedGroup case_id
0 A P O 2 NaN False
1 A O O 5 1.0 True
2 A O O 10 1.0 True
3 A O P 4 NaN False
4 A P P 300 NaN False
5 A P O 2 NaN False
6 A O O 5 2.0 True
7 A O O 10 2.0 True
8 A O P 4 NaN False
9 A P P 300 NaN False
10 B P O 2 NaN False
11 B O O 5 3.0 True
12 B O O 10 3.0 True
13 B O P 4 NaN False
14 B P P 300 NaN False