Filtering pandas dataframe with multiple Boolean columns
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over a Mysterious Island
--
Chapters
00:00 Filtering Pandas Dataframe With Multiple Boolean Columns
00:43 Accepted Answer Score 78
01:34 Answer 2 Score 5
01:45 Answer 3 Score 15
02:22 Answer 4 Score 1
03:13 Thank you
--
Full question
https://stackoverflow.com/questions/4620...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #numpy #dataframe #boolean
#avk47
    Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Over a Mysterious Island
--
Chapters
00:00 Filtering Pandas Dataframe With Multiple Boolean Columns
00:43 Accepted Answer Score 78
01:34 Answer 2 Score 5
01:45 Answer 3 Score 15
02:22 Answer 4 Score 1
03:13 Thank you
--
Full question
https://stackoverflow.com/questions/4620...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #numpy #dataframe #boolean
#avk47
ACCEPTED ANSWER
Score 78
In [82]: d
Out[82]:
             A   B      C      D
0     John Doe  45   True  False
1   Jane Smith  32  False  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
Solution 1:
In [83]: d.loc[d.C | d.D]
Out[83]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
Solution 2:
In [94]: d[d[['C','D']].any(1)]
Out[94]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
Solution 3:
In [95]: d.query("C or D")
Out[95]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
PS If you change your solution to:
df[(df['C']==True) | (df['D']==True)]
it'll work too
Pandas docs - boolean indexing
why we should NOT use "PEP complaint"
df["col_name"] is Trueinstead ofdf["col_name"] == True?
In [11]: df = pd.DataFrame({"col":[True, True, True]})
In [12]: df
Out[12]:
    col
0  True
1  True
2  True
In [13]: df["col"] is True
Out[13]: False               # <----- oops, that's not exactly what we wanted
ANSWER 2
Score 15
Hooray! More options!
np.where
df[np.where(df.C | df.D, True, False)]
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True  
pd.Series.where on df.index
df.loc[df.index.where(df.C | df.D).dropna()]
               A   B      C      D
0.0     John Doe  45   True  False
2.0  Alan Holmes  55  False   True
3.0   Eric Lamar  29   True   True
df.select_dtypes
df[df.select_dtypes([bool]).any(1)]   
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
Abusing np.select
df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates()
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
ANSWER 3
Score 5
Or
d[d.eval('C or D')]
Out[1065]:
             A   B      C      D
0     John Doe  45   True  False
2  Alan Holmes  55  False   True
3   Eric Lamar  29   True   True
ANSWER 4
Score 1
So, the easiest way to do this:
students = [ ('jack1', 'Apples1' , 341) ,
             ('Riti1', 'Mangos1'  , 311) ,
             ('Aadi1', 'Grapes1' , 301) ,
             ('Sonia1', 'Apples1', 321) ,
             ('Lucy1', 'Mangos1'  , 331) ,
             ('Mike1', 'Apples1' , 351),
              ('Mik', 'Apples1' , np.nan)
              ]
#Create a DataFrame object
df = pd.DataFrame(students, columns = ['Name1' , 'Product1', 'Sale1']) 
print(df)
    Name1 Product1  Sale1
0   jack1  Apples1    341
1   Riti1  Mangos1    311
2   Aadi1  Grapes1    301
3  Sonia1  Apples1    321
4   Lucy1  Mangos1    331
5   Mike1  Apples1    351
6     Mik  Apples1    NaN
# Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,
subset = df[df['Product1'] == 'Apples1']
print(subset)
 Name1 Product1  Sale1
0   jack1  Apples1    341
3  Sonia1  Apples1    321
5   Mike1  Apples1    351
6     Mik  Apples1    NA
# Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, AND notnull value in Sale
subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'].notnull())]
print(subsetx)
    Name1   Product1    Sale1
0   jack1   Apples1      341
3   Sonia1  Apples1      321
5   Mike1   Apples1      351
# Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, AND Sale = 351
subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'] == 351)]
print(subsetx)
   Name1 Product1  Sale1
5  Mike1  Apples1    351
# Another example
subsetData = df[df['Product1'].isin(['Mangos1', 'Grapes1']) ]
print(subsetData)
Name1 Product1  Sale1
1  Riti1  Mangos1    311
2  Aadi1  Grapes1    301
4  Lucy1  Mangos1    331
Here is the source of this code: https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
I added minor changes to it.