Masking multiple columns on a pandas dataframe in python
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Meadow
--
Chapters
00:00 Masking Multiple Columns On A Pandas Dataframe In Python
00:53 Accepted Answer Score 18
01:48 Answer 2 Score 0
02:42 Thank you
--
Full question
https://stackoverflow.com/questions/2432...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #multiplecolumns #dataframe #mask
#avk47
ACCEPTED ANSWER
Score 18
You can use the pandas all method and Boolean logic. As EdChum commented, I am a bit unclear still on your exact example, but a similar example is:
In [1]: df = DataFrame([[1,2],[-3,5]], index=[0,1], columns=['a','b'])
In [2]: df
Out [2]:
a b
0 1 2
1 -3 5
In [3]: msk = (df>1) & (df<5)
In [4]: msk
Out [4]:
a b
0 False True
1 False False
In [5]: msk.all(axis=1)
Out [5]:
0 False
1 False
dtype: bool
If you wanted to index the original dataframe by the mask you could do:
In [6]: df[msk]
Out [6]:
a b
0 NaN 2
1 NaN NaN
Or as you originally indicated, rows where all the rows are true:
In [7]: idx = msk.all(axis=1)
In [8]: df[idx]
Out [8]:
Empty DataFrame
Columns: [a,b]
Index: []
Or if one row was true
In [9]: idx[0] = True
In [10]: df[idx]
Out [10]:
a b
0 1 2
For the original question after clarification from the comments, where we want different filtering criteria for different columns:
In [10]: msk1 = df[['a']] < 0
In [11]: msk2 = df[['b']] > 3
In [12]: msk = concat((msk1, msk2), axis=1)
In [12]: slct = msk.all(axis=1)
In [13]: df.ix[slct]
Out [13]:
a b
1 -3 5
ANSWER 2
Score 0
df[df[['DE', 'GA', 'ID']].all(axis=1) * (1 - df[['FL', 'IA']]).all(axis=1)]
The hard part here is understanding why you're using even/odd column positions to determine the treatment. Based on your code, it looks like you want columns 0, 2, and 4 to actually be 1 minus their current values. However, based on what you claim is the expected output, it actually seems like you want colums 1 and 3 to have 1 minus their current values.
My code above reflects the latter assumption. The general idea still works; just tune it to reflect whatever columns you actually need to have 1 minus the value of, assuming you make your desired output more rigorously defined.
Probably that needs to be cleaned up and turned into a proper helper function first that explicitly shows which columns need to have 1 minus their value, versus which columns can be left alone.