The Python Oracle

How to aggregate a boolean field with null values with pandas?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping

--

Chapters
00:00 How To Aggregate A Boolean Field With Null Values With Pandas?
01:20 Accepted Answer Score 4
02:08 Thank you

--

Full question
https://stackoverflow.com/questions/4300...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #aggregate

#avk47



ACCEPTED ANSWER

Score 4


Convert them to numeric columns. The None will become NaN, Trues become 1, and Falses become 0. A convenient way to convert the whole dataframe is to use pd.to_numeric with the errors parameter set to ignore. This will leave the grouping column alone because it will error out on move on.

Consider the dataframe df

df = pd.DataFrame(dict(
        gcol=list('aaaabbbb'),
        clc1=[True, False, True, None] * 2,
        clc2=[True, False, True, False] * 2,
        clc3=[True, True, True, True] * 2,
        clc4=[False, None, None, True]* 2
    ))

This is what converting to numeric looks like

df.apply(pd.to_numeric, errors='ignore')

   clc1   clc2  clc3  clc4 gcol
0   1.0   True  True   0.0    a
1   0.0  False  True   NaN    a
2   1.0   True  True   NaN    a
3   NaN  False  True   1.0    a
4   1.0   True  True   0.0    b
5   0.0  False  True   NaN    b
6   1.0   True  True   NaN    b
7   NaN  False  True   1.0    b

Using this with the subsequent groupby should get you what you want.

df.apply(pd.to_numeric, errors='ignore').groupby('gcol').mean()

          clc1  clc2  clc3  clc4
gcol                            
a     0.666667   0.5   1.0   0.5
b     0.666667   0.5   1.0   0.5