Assessing values to a pandas column with conditions depending on other columns

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Cool Puzzler LoFi

--

Chapters
00:00 Assessing Values To A Pandas Column With Conditions Depending On Other Columns
01:01 Accepted Answer Score 7
01:32 Answer 2 Score 5
02:00 Thank you

--

Full question
https://stackoverflow.com/questions/7114...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #dataframe

#avk47

ACCEPTED ANSWER

Score 7

Create 2 boolean masks then combine them and find the highest id_res value per col:

m1 = df['col'].duplicated(keep=False)
m2 = ~df['id_res'].duplicated(keep=False)
df['check'] = df.index.isin(df[m1 & m2].groupby('col')['id_res'].idxmax())
print(df)

# Output
      col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False

Details:

>>> pd.concat([df, m1.rename('m1'), m2.rename('m2')])
      col  id_res  check     m1     m2
0   paris      12  False   True  False
1   paris      12  False   True  False
2  nantes      14  False  False   True
3  berlin      28   True   True   True  # <-  group to check
4  berlin       8  False   True   True  # <-     because 
5  berlin       4  False   True   True  # <- m1 and m2 are True
6   tokyo      89  False  False   True

ANSWER 2

Score 5

You basically have 3 conditions, so use masks and take the logical intersection (AND/&):

g = df_test.groupby('col')['id_res']

# is col duplicated?
m1 = df_test['col'].duplicated(keep=False)
# [ True  True False  True  True  True False]

# is id_res max of its group?
m2 = df_test['id_res'].eq(g.transform('max'))
# [ True  True  True  True False False  True]

# is group diverse? (more than 1 id_res)
m3 = g.transform('nunique').gt(1)
# [False False False  True  True  True False]

# check if all conditions True
df_test['check'] = m1&m2&m3

Output:

      col  id_res  check
0   paris      12  False
1   paris      12  False
2  nantes      14  False
3  berlin      28   True
4  berlin       8  False
5  berlin       4  False
6   tokyo      89  False