Modifying a subset of rows in a pandas dataframe
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Future Grid Looping
--
Chapters
00:00 Modifying A Subset Of Rows In A Pandas Dataframe
00:25 Accepted Answer Score 326
01:09 Answer 2 Score 112
01:42 Answer 3 Score 33
03:09 Answer 4 Score 6
03:19 Thank you
--
Full question
https://stackoverflow.com/questions/1230...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 326
Use .loc for label based indexing:
df.loc[df.A==0, 'B'] = np.nan
The df.A==0 expression creates a boolean series that indexes the rows, 'B' selects the column. You can also use this to transform a subset of a column, e.g.:
df.loc[df.A==0, 'B'] = df.loc[df.A==0, 'B'] / 2
I don't know enough about pandas internals to know exactly why that works, but the basic issue is that sometimes indexing into a DataFrame returns a copy of the result, and sometimes it returns a view on the original object. According to documentation here, this behavior depends on the underlying numpy behavior. I've found that accessing everything in one operation (rather than [one][two]) is more likely to work for setting.
ANSWER 2
Score 112
Here is from pandas docs on advanced indexing:
The section will explain exactly what you need! Turns out df.loc (as .ix has been deprecated -- as many have pointed out below) can be used for cool slicing/dicing of a dataframe. And. It can also be used to set things. 
df.loc[selection criteria, columns I want] = value
So Bren's answer is saying 'find me all the places where df.A == 0, select column B and set it to np.nan'
ANSWER 3
Score 33
Starting from pandas 0.20 ix is deprecated. The right way is to use df.loc
here is a working example
>>> import pandas as pd 
>>> import numpy as np 
>>> df = pd.DataFrame({"A":[0,1,0], "B":[2,0,5]}, columns=list('AB'))
>>> df.loc[df.A == 0, 'B'] = np.nan
>>> df
   A   B
0  0 NaN
1  1   0
2  0 NaN
>>> 
Explanation:
As explained in the doc here, .loc is primarily label based, but may also be used with a boolean array. 
So, what we are doing above is applying df.loc[row_index, column_index] by: 
- Exploiting the fact that 
loccan take a boolean array as a mask that tells pandas which subset of rows we want to change inrow_index - Exploiting the fact 
locis also label based to select the column using the label'B'in thecolumn_index 
We can use logical, condition or  any operation that returns a series of booleans to construct the array of booleans.  In the above example, we want any rows that contain a 0, for that we can use df.A == 0, as you can see in the example below, this returns a series of booleans.   
>>> df = pd.DataFrame({"A":[0,1,0], "B":[2,0,5]}, columns=list('AB'))
>>> df 
   A  B
0  0  2
1  1  0
2  0  5
>>> df.A == 0 
0     True
1    False
2     True
Name: A, dtype: bool
>>> 
Then, we use the above array of booleans to select and modify the necessary rows:
>>> df.loc[df.A == 0, 'B'] = np.nan
>>> df
   A   B
0  0 NaN
1  1   0
2  0 NaN
For more information check the advanced indexing documentation here.
ANSWER 4
Score 6
To replace multiples columns convert to numpy array using .values:
df.loc[df.A==0, ['B', 'C']] = df.loc[df.A==0, ['B', 'C']].values / 2