The Python Oracle

How to create incrementing group column counter

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Forest of Spells Looping

--

Chapters
00:00 How To Create Incrementing Group Column Counter
00:46 Accepted Answer Score 2
01:23 Answer 2 Score 1
01:54 Answer 3 Score 1
02:22 Thank you

--

Full question
https://stackoverflow.com/questions/7419...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 2


You can use diff to select only the first item of each stretch of True:

df['ExpectedGroup'] = (df['case_id'].diff()
                      &df['case_id']
                      ).cumsum().where(df['case_id'])

If you don't want the intermediate column:

s = (df.FromState == 'O') & (df.ToState == 'O')
# or
# s = df[['FromState', 'ToState']].eq('O').all(axis=1)

df['ExpectedGroup'] = (s.diff()&s).cumsum().where(s)
# or
# df.loc[s, 'ExpectedGroup'] = (s.diff()&s).cumsum()

Output:

  ID FromState ToState  Hours  ExpectedGroup  case_id
0  A         P       O      2            NaN    False
1  A         O       O      5            1.0     True
2  A         O       O     10            1.0     True
3  A         O       P      4            NaN    False
4  A         P       P    300            NaN    False
5  B         P       O      2            NaN    False
6  B         O       O      5            2.0     True
7  B         O       O     10            2.0     True
8  B         O       P      4            NaN    False
9  B         P       P    300            NaN    False



ANSWER 2

Score 1


Let's use cumsum to create counter then reencode the counter using factorize

m = df['case_id']
df.loc[m, 'ExpectedGroup'] = (~m).cumsum()[m].factorize()[0] + 1

   ID FromState ToState  Hours  ExpectedGroup  case_id
0   A         P       O      2            NaN    False
1   A         O       O      5            1.0     True
2   A         O       O     10            1.0     True
3   A         O       P      4            NaN    False
4   A         P       P    300            NaN    False
5   A         P       O      2            NaN    False
6   A         O       O      5            2.0     True
7   A         O       O     10            2.0     True
8   A         O       P      4            NaN    False
9   A         P       P    300            NaN    False
10  B         P       O      2            NaN    False
11  B         O       O      5            3.0     True
12  B         O       O     10            3.0     True
13  B         O       P      4            NaN    False
14  B         P       P    300            NaN    False



ANSWER 3

Score 1


Similar to mozway's brilliant approach:

df['ExpectedGroup'] = (df['case_id'].shift(-1) & df['case_id']).cumsum().mask(~s)
df

   ID FromState ToState  Hours  ExpectedGroup  case_id
0   A         P       O      2            NaN    False
1   A         O       O      5            1.0     True
2   A         O       O     10            1.0     True
3   A         O       P      4            NaN    False
4   A         P       P    300            NaN    False
5   A         P       O      2            NaN    False
6   A         O       O      5            2.0     True
7   A         O       O     10            2.0     True
8   A         O       P      4            NaN    False
9   A         P       P    300            NaN    False
10  B         P       O      2            NaN    False
11  B         O       O      5            3.0     True
12  B         O       O     10            3.0     True
13  B         O       P      4            NaN    False
14  B         P       P    300            NaN    False