Conditional Replace Pandas
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Luau
--
Chapters
00:00 Conditional Replace Pandas
00:29 Answer 1 Score 102
00:44 Accepted Answer Score 279
01:39 Answer 3 Score 79
01:54 Answer 4 Score 36
02:51 Thank you
--
Full question
https://stackoverflow.com/questions/2160...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #replace #conditionalstatements #series
#avk47
ACCEPTED ANSWER
Score 279
.ix indexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ix indexer is deprecated, so you should avoid using it. Instead, you can use .loc or iloc indexers. You can solve this problem by:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
Or, in one line,
df.loc[df.my_channel > 20000, 'my_channel'] = 0
mask helps you to select the rows in which df.my_channel > 20000 is True, while df.loc[mask, column_name] = 0 sets the value 0 to the selected rows where maskholds in the column which name is column_name.
Update:
In this case, you should use loc because if you use iloc, you will get a NotImplementedError telling you that iLocation based boolean indexing on an integer type is not available.
ANSWER 2
Score 102
Try
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note: Since v0.20.0, ix has been deprecated in favour of loc / iloc.
ANSWER 3
Score 79
np.where function works as follows:
df['X'] = np.where(df['Y']>=50, 'yes', 'no')
In your case you would want:
import numpy as np
df['my_channel'] = np.where(df.my_channel > 20000, 0, df.my_channel)
ANSWER 4
Score 36
The reason your original dataframe does not update is because chained indexing may cause you to modify a copy rather than a view of your dataframe. The docs give this advice:
When setting values in a pandas object, care must be taken to avoid what is called chained indexing.
You have a few alternatives:-
loc + Boolean indexing
loc may be used for setting values and supports Boolean masks:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask + Boolean indexing
You can assign to your series:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where + Boolean indexing
You can use NumPy by assigning your original series when your condition is not satisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])