python dataframe boolean values with if statement
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Track title: CC B Schuberts Piano Sonata No 16 D
--
Chapters
00:00 Python Dataframe Boolean Values With If Statement
00:23 Answer 1 Score 3
00:40 Answer 2 Score 2
01:22 Accepted Answer Score 1
01:48 Thank you
--
Full question
https://stackoverflow.com/questions/4388...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #ifstatement #dataframe
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Track title: CC B Schuberts Piano Sonata No 16 D
--
Chapters
00:00 Python Dataframe Boolean Values With If Statement
00:23 Answer 1 Score 3
00:40 Answer 2 Score 2
01:22 Accepted Answer Score 1
01:48 Thank you
--
Full question
https://stackoverflow.com/questions/4388...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #ifstatement #dataframe
#avk47
ANSWER 1
Score 3
In [28]: df_picru['new'] = \
df_picru['REF_INT'].duplicated(keep=False) \
.map({True:'duplicates',False:'unique'})
In [29]: df_picru
Out[29]:
REF_INT new
0 1 unique
1 2 duplicates
2 3 unique
3 8 duplicates
4 8 duplicates
5 2 duplicates
ANSWER 2
Score 2
I think you need duplicated for boolean mask and for new column numpy.where:
mask = df_picru['REF_INT'].duplicated(keep=False)
Sample:
df_picru = pd.DataFrame({'REF_INT':[1,2,3,8,8,2]})
mask = df_picru['REF_INT'].duplicated(keep=False)
print (mask)
0 False
1 True
2 False
3 True
4 True
5 True
Name: REF_INT, dtype: bool
df_picru['new'] = np.where(mask, 'duplicates', 'unique')
print (df_picru)
REF_INT new
0 1 unique
1 2 duplicates
2 3 unique
3 8 duplicates
4 8 duplicates
5 2 duplicates
If need check at least one if unique value need any for convert boolean mask - array to scalar True or False:
if mask.any():
print ('at least one unique')
at least one unique
ACCEPTED ANSWER
Score 1
Another solution using groupby.
#groupby REF_INT and then count the occurrence and set as duplicate if count is greater than 1
df_picru.groupby('REF_INT').apply(lambda x: 'Duplicated' if len(x)> 1 else 'Unique')
Out[21]:
REF_INT
1 Unique
2 Duplicated
3 Unique
8 Duplicated
dtype: object
value_counts can actually work if you make a minor change:
df_picru.REF_INT.value_counts()[lambda x: x>1]
Out[31]:
2 2
8 2
Name: REF_INT, dtype: int64