pandas: read_csv how to force bool data to dtype bool instead of object
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 2 Looping
--
Chapters
00:00 Pandas: Read_csv How To Force Bool Data To Dtype Bool Instead Of Object
00:55 Accepted Answer Score 10
01:24 Answer 2 Score 9
01:54 Answer 3 Score 3
02:23 Thank you
--
Full question
https://stackoverflow.com/questions/2973...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 2 Looping
--
Chapters
00:00 Pandas: Read_csv How To Force Bool Data To Dtype Bool Instead Of Object
00:55 Accepted Answer Score 10
01:24 Answer 2 Score 9
01:54 Answer 3 Score 3
02:23 Thank you
--
Full question
https://stackoverflow.com/questions/2973...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 10
As you had a missing value in your csv the dtype of the columns is shown to be object as you have mixed dtypes, the first 3 row values are boolean, the last will be a float.
To convert the NaN value use fillna, it accepts a dict to map desired fill values with columns and produce a homogeneous dtype:
>>> t = """
A B C D
a 1 NaN true
b 5 7 false
c 3 2 true
d 9 4 """
>>> df = pd.read_csv(io.StringIO(t),sep='\s+')
>>> df
A B C D
0 a 1 NaN True
1 b 5 7 False
2 c 3 2 True
3 d 9 4 NaN
>>> df.fillna({'C':0, 'D':False})
A B C D
0 a 1 0 True
1 b 5 7 False
2 c 3 2 True
3 d 9 4 False
ANSWER 2
Score 9
You can use dtype, it accepts a dictionary for mapping columns:
dtype : Type name or dict of column -> type Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
import pandas as pd
import numpy as np
import io
# using your sample
csv_file = io.BytesIO('''
A B C D
a 1 2 true
b 5 7 false
c 3 2 true
d 9 4''')
df = pd.read_csv(csv_file, sep=r'\s+', dtype={'D': np.bool})
# then fillna to convert NaN to False
df = df.fillna(value=False)
df
A B C D
0 a 1 2 True
1 b 5 7 False
2 c 3 2 True
3 d 9 4 False
df.D.dtypes
dtype('bool')
ANSWER 3
Score 3
From this very similar question, I would suggest using converters kwarg:
import pandas as pd
pd.read_csv('data.csv',
converters={'D': lambda x: True if x == 'true' else False})
as per your comment stating that NaN value should be replaced by False.
converters keyword argument can take a dictionary with keys being column names and values being functions to apply.