Logical OR on a subset of columns in a DataFrame

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Book End

--

Chapters
00:00 Logical Or On A Subset Of Columns In A Dataframe
00:37 Accepted Answer Score 4
01:50 Answer 2 Score 4
02:28 Thank you

--

Full question
https://stackoverflow.com/questions/3160...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47

ACCEPTED ANSWER

Score 4

Building on LondonRob's answer, you could use

df.loc[df[mylist].any(axis=1)]

Calling the DataFrame's any method will perform better than using apply to call Python's builtin any function once per row.

Or you could use np.logical_or.reduce:

df.loc[np.logical_or.reduce(df[mylist], axis=1)]

For large DataFrames, using np.logical_or may be quicker:

In [30]: df = pd.DataFrame(np.random.binomial(1, 0.1, size=(100,300)).astype(bool))

In [31]: %timeit df.loc[np.logical_or.reduce(df, axis=1)]
1000 loops, best of 3: 261 µs per loop

In [32]: %timeit df.loc[df.any(axis=1)]
1000 loops, best of 3: 636 µs per loop

In [33]: %timeit df[df.apply(any, axis=1)]
100 loops, best of 3: 2.13 ms per loop

Note that df.any has extra features, such as the ability to skip NaNs. In this case, if the columns are boolean-valued, then there can not be any NaNs (since NaNs are float values). So np.logical_or.reduce is quicker.

import numpy as np
import pandas as pd
np.random.seed(2014)
df = pd.DataFrame(np.random.binomial(1, 0.1, size=(10,3)).astype(bool), 
                  columns=list('ABC'))
print(df)
#        A      B      C
# 0  False  False  False
# 1   True  False  False
# 2  False  False  False
# 3   True  False  False
# 4  False  False  False
# 5  False  False  False
# 6  False   True  False
# 7  False  False  False
# 8  False  False  False
# 9  False  False  False

mylist = list('ABC')
print(df[ df[mylist[0]] | df[mylist[1]] | df[mylist[2]] ])
print(df.loc[df[mylist].any(axis=1)])
print(df.loc[np.logical_or.reduce(df[mylist], axis=1)])

yields the rows where at least one of the columns is True:

       A      B      C
1   True  False  False
3   True  False  False
6  False   True  False

ANSWER 2

Score 4

There's a much simpler way to do this using python's built in any function:

In []: mylist
Out[]: ['A', 'B']

In []: df
Out[]: 
       A      B      C
0  False  False  False
1   True  False  False
2  False  False  False
3   True  False  False
4  False  False  False
5  False  False  False
6  False   True  False
7  False  False  False
8  False  False  False
9  False  False  False

You can apply the function any along the rows of df by using axis=1. In this case I'll only apply any to a subset of the columns:

In []: df[mylist].apply(any, axis=1)
Out[]: 
0    False
1     True
2    False
3     True
4    False
5    False
6     True
7    False
8    False
9    False
dtype: bool

This gives us the perfect way to select our rows:

In []: df[df[mylist].apply(any, axis=1)]
Out[]: 
       A      B      C
1   True  False  False
3   True  False  False
6  False   True  False