How to create incrementing group column counter
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Forest of Spells Looping
--
Chapters
00:00 How To Create Incrementing Group Column Counter
00:46 Accepted Answer Score 2
01:23 Answer 2 Score 1
01:54 Answer 3 Score 1
02:22 Thank you
--
Full question
https://stackoverflow.com/questions/7419...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Forest of Spells Looping
--
Chapters
00:00 How To Create Incrementing Group Column Counter
00:46 Accepted Answer Score 2
01:23 Answer 2 Score 1
01:54 Answer 3 Score 1
02:22 Thank you
--
Full question
https://stackoverflow.com/questions/7419...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #dataframe
#avk47
ACCEPTED ANSWER
Score 2
You can use diff to select only the first item of each stretch of True:
df['ExpectedGroup'] = (df['case_id'].diff()
&df['case_id']
).cumsum().where(df['case_id'])
If you don't want the intermediate column:
s = (df.FromState == 'O') & (df.ToState == 'O')
# or
# s = df[['FromState', 'ToState']].eq('O').all(axis=1)
df['ExpectedGroup'] = (s.diff()&s).cumsum().where(s)
# or
# df.loc[s, 'ExpectedGroup'] = (s.diff()&s).cumsum()
Output:
ID FromState ToState Hours ExpectedGroup case_id
0 A P O 2 NaN False
1 A O O 5 1.0 True
2 A O O 10 1.0 True
3 A O P 4 NaN False
4 A P P 300 NaN False
5 B P O 2 NaN False
6 B O O 5 2.0 True
7 B O O 10 2.0 True
8 B O P 4 NaN False
9 B P P 300 NaN False
ANSWER 2
Score 1
Let's use cumsum to create counter then reencode the counter using factorize
m = df['case_id']
df.loc[m, 'ExpectedGroup'] = (~m).cumsum()[m].factorize()[0] + 1
ID FromState ToState Hours ExpectedGroup case_id
0 A P O 2 NaN False
1 A O O 5 1.0 True
2 A O O 10 1.0 True
3 A O P 4 NaN False
4 A P P 300 NaN False
5 A P O 2 NaN False
6 A O O 5 2.0 True
7 A O O 10 2.0 True
8 A O P 4 NaN False
9 A P P 300 NaN False
10 B P O 2 NaN False
11 B O O 5 3.0 True
12 B O O 10 3.0 True
13 B O P 4 NaN False
14 B P P 300 NaN False
ANSWER 3
Score 1
Similar to mozway's brilliant approach:
df['ExpectedGroup'] = (df['case_id'].shift(-1) & df['case_id']).cumsum().mask(~s)
df
ID FromState ToState Hours ExpectedGroup case_id
0 A P O 2 NaN False
1 A O O 5 1.0 True
2 A O O 10 1.0 True
3 A O P 4 NaN False
4 A P P 300 NaN False
5 A P O 2 NaN False
6 A O O 5 2.0 True
7 A O O 10 2.0 True
8 A O P 4 NaN False
9 A P P 300 NaN False
10 B P O 2 NaN False
11 B O O 5 3.0 True
12 B O O 10 3.0 True
13 B O P 4 NaN False
14 B P P 300 NaN False