The Python Oracle

python dataframe boolean values with if statement

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Track title: CC O Beethoven - Piano Sonata No 3 in C

--

Chapters
00:00 Question
00:31 Accepted answer (Score 1)
01:01 Answer 2 (Score 3)
01:17 Answer 3 (Score 2)
02:10 Thank you

--

Full question
https://stackoverflow.com/questions/4388...

Answer 2 links:
[duplicated]: http://pandas.pydata.org/pandas-docs/sta...
[numpy.where]: http://docs.scipy.org/doc/numpy-1.10.1/r...
[any]: http://pandas.pydata.org/pandas-docs/sta...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #ifstatement #dataframe

#avk47



ANSWER 1

Score 3


In [28]: df_picru['new'] = \
             df_picru['REF_INT'].duplicated(keep=False) \
                     .map({True:'duplicates',False:'unique'})

In [29]: df_picru
Out[29]:
   REF_INT         new
0        1      unique
1        2  duplicates
2        3      unique
3        8  duplicates
4        8  duplicates
5        2  duplicates



ANSWER 2

Score 2


I think you need duplicated for boolean mask and for new column numpy.where:

mask = df_picru['REF_INT'].duplicated(keep=False)

Sample:

df_picru = pd.DataFrame({'REF_INT':[1,2,3,8,8,2]})

mask = df_picru['REF_INT'].duplicated(keep=False)
print (mask)
0    False
1     True
2    False
3     True
4     True
5     True
Name: REF_INT, dtype: bool

df_picru['new'] = np.where(mask, 'duplicates', 'unique')
print (df_picru)
   REF_INT         new
0        1      unique
1        2  duplicates
2        3      unique
3        8  duplicates
4        8  duplicates
5        2  duplicates

If need check at least one if unique value need any for convert boolean mask - array to scalar True or False:

if mask.any():
    print ('at least one unique')
at least one unique



ACCEPTED ANSWER

Score 1


Another solution using groupby.

#groupby REF_INT and then count the occurrence and set as duplicate if count is greater than 1
df_picru.groupby('REF_INT').apply(lambda x: 'Duplicated' if len(x)> 1 else 'Unique')
Out[21]: 
REF_INT
1        Unique
2    Duplicated
3        Unique
8    Duplicated
dtype: object

value_counts can actually work if you make a minor change:

df_picru.REF_INT.value_counts()[lambda x: x>1]
Out[31]: 
2    2
8    2
Name: REF_INT, dtype: int64