How to aggregate a boolean field with null values with pandas?
--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping
--
Chapters
00:00 How To Aggregate A Boolean Field With Null Values With Pandas?
01:20 Accepted Answer Score 4
02:08 Thank you
--
Full question
https://stackoverflow.com/questions/4300...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #aggregate
#avk47
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puddle Jumping Looping
--
Chapters
00:00 How To Aggregate A Boolean Field With Null Values With Pandas?
01:20 Accepted Answer Score 4
02:08 Thank you
--
Full question
https://stackoverflow.com/questions/4300...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas #aggregate
#avk47
ACCEPTED ANSWER
Score 4
Convert them to numeric columns. The None will become NaN, Trues become 1, and Falses become 0. A convenient way to convert the whole dataframe is to use pd.to_numeric with the errors parameter set to ignore. This will leave the grouping column alone because it will error out on move on.
Consider the dataframe df
df = pd.DataFrame(dict(
gcol=list('aaaabbbb'),
clc1=[True, False, True, None] * 2,
clc2=[True, False, True, False] * 2,
clc3=[True, True, True, True] * 2,
clc4=[False, None, None, True]* 2
))
This is what converting to numeric looks like
df.apply(pd.to_numeric, errors='ignore')
clc1 clc2 clc3 clc4 gcol
0 1.0 True True 0.0 a
1 0.0 False True NaN a
2 1.0 True True NaN a
3 NaN False True 1.0 a
4 1.0 True True 0.0 b
5 0.0 False True NaN b
6 1.0 True True NaN b
7 NaN False True 1.0 b
Using this with the subsequent groupby should get you what you want.
df.apply(pd.to_numeric, errors='ignore').groupby('gcol').mean()
clc1 clc2 clc3 clc4
gcol
a 0.666667 0.5 1.0 0.5
b 0.666667 0.5 1.0 0.5