pandas: read_csv how to force bool data to dtype bool instead of object

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 2 Looping

--

Chapters
00:00 Pandas: Read_csv How To Force Bool Data To Dtype Bool Instead Of Object
00:55 Accepted Answer Score 10
01:24 Answer 2 Score 9
01:54 Answer 3 Score 3
02:23 Thank you

--

Full question
https://stackoverflow.com/questions/2973...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47

ACCEPTED ANSWER

Score 10

As you had a missing value in your csv the dtype of the columns is shown to be object as you have mixed dtypes, the first 3 row values are boolean, the last will be a float.

To convert the NaN value use fillna, it accepts a dict to map desired fill values with columns and produce a homogeneous dtype:

>>> t = """
A   B   C    D
a   1  NaN  true
b   5   7   false
c   3   2   true
d   9   4 """
>>> df = pd.read_csv(io.StringIO(t),sep='\s+')
>>> df
   A  B   C    D
0  a  1  NaN  True
1  b  5   7   False
2  c  3   2   True
3  d  9   4   NaN
>>> df.fillna({'C':0, 'D':False})
   A  B  C   D
0  a  1  0  True
1  b  5  7  False
2  c  3  2  True
3  d  9  4  False

ANSWER 2

Score 9

You can use dtype, it accepts a dictionary for mapping columns:

dtype : Type name or dict of column -> type
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}

import pandas as pd
import numpy as np
import io

# using your sample
csv_file = io.BytesIO('''
A    B    C    D
a    1    2    true
b    5    7    false
c    3    2    true
d    9    4''')

df = pd.read_csv(csv_file, sep=r'\s+', dtype={'D': np.bool})
# then fillna to convert NaN to False
df = df.fillna(value=False)

df 
   A  B  C      D
0  a  1  2   True
1  b  5  7  False
2  c  3  2   True
3  d  9  4  False

df.D.dtypes
dtype('bool')

ANSWER 3

Score 3

From this very similar question, I would suggest using converters kwarg:

import pandas as pd
pd.read_csv('data.csv',
            converters={'D': lambda x: True if x == 'true' else False})

as per your comment stating that NaN value should be replaced by False.

converters keyword argument can take a dictionary with keys being column names and values being functions to apply.