Logical OR on a subset of columns in a DataFrame
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Book End
--
Chapters
00:00 Logical Or On A Subset Of Columns In A Dataframe
00:37 Accepted Answer Score 4
01:50 Answer 2 Score 4
02:28 Thank you
--
Full question
https://stackoverflow.com/questions/3160...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 4
Building on LondonRob's answer, you could use
df.loc[df[mylist].any(axis=1)]
Calling the DataFrame's any method will perform better than using apply to call Python's builtin any function once per row. 
Or you could use np.logical_or.reduce:
df.loc[np.logical_or.reduce(df[mylist], axis=1)]
For large DataFrames, using np.logical_or may be quicker:
In [30]: df = pd.DataFrame(np.random.binomial(1, 0.1, size=(100,300)).astype(bool))
In [31]: %timeit df.loc[np.logical_or.reduce(df, axis=1)]
1000 loops, best of 3: 261 µs per loop
In [32]: %timeit df.loc[df.any(axis=1)]
1000 loops, best of 3: 636 µs per loop
In [33]: %timeit df[df.apply(any, axis=1)]
100 loops, best of 3: 2.13 ms per loop
Note that df.any has extra features, such as the ability to skip NaNs. 
In this case, if the columns are boolean-valued, then there can not be any NaNs
(since NaNs are float values). So np.logical_or.reduce is quicker.
import numpy as np
import pandas as pd
np.random.seed(2014)
df = pd.DataFrame(np.random.binomial(1, 0.1, size=(10,3)).astype(bool), 
                  columns=list('ABC'))
print(df)
#        A      B      C
# 0  False  False  False
# 1   True  False  False
# 2  False  False  False
# 3   True  False  False
# 4  False  False  False
# 5  False  False  False
# 6  False   True  False
# 7  False  False  False
# 8  False  False  False
# 9  False  False  False
mylist = list('ABC')
print(df[ df[mylist[0]] | df[mylist[1]] | df[mylist[2]] ])
print(df.loc[df[mylist].any(axis=1)])
print(df.loc[np.logical_or.reduce(df[mylist], axis=1)])
yields the rows where at least one of the columns is True:
       A      B      C
1   True  False  False
3   True  False  False
6  False   True  False
ANSWER 2
Score 4
There's a much simpler way to do this using python's built in any function:
In []: mylist
Out[]: ['A', 'B']
In []: df
Out[]: 
       A      B      C
0  False  False  False
1   True  False  False
2  False  False  False
3   True  False  False
4  False  False  False
5  False  False  False
6  False   True  False
7  False  False  False
8  False  False  False
9  False  False  False
You can apply the function any along the rows of df by using axis=1. In this case I'll only apply any to a subset of the columns:
In []: df[mylist].apply(any, axis=1)
Out[]: 
0    False
1     True
2    False
3     True
4    False
5    False
6     True
7    False
8    False
9    False
dtype: bool
This gives us the perfect way to select our rows:
In []: df[df[mylist].apply(any, axis=1)]
Out[]: 
       A      B      C
1   True  False  False
3   True  False  False
6  False   True  False