The Python Oracle

Get pandas.read_csv to read empty values as empty string instead of nan

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Track title: CC M Beethoven - Piano Sonata No 3 in C 3

--

Chapters
00:00 Question
01:24 Accepted answer (Score 71)
01:54 Answer 2 (Score 198)
02:57 Answer 3 (Score 14)
03:15 Answer 4 (Score 9)
03:51 Thank you

--

Full question
https://stackoverflow.com/questions/1086...

Accepted answer links:
https://github.com/pydata/pandas/issues/...

Answer 2 links:
[More consistent na_values handling in read_csv · Issue #1657 · pandas-dev/pandas]: https://github.com/pandas-dev/pandas/iss...
[BUG: more consistent na_values #1657 · pandas-dev/pandas@d9abf68]: https://github.com/pandas-dev/pandas/com...

Answer 3 links:
[read_csv()]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #csv #pandas

#avk47



ANSWER 1

Score 233


I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.

Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False:

pd.read_csv('test.csv', keep_default_na=False)

This issue is more clearly explained in

That was fixed on on Aug 19, 2012 for Pandas version 0.9 in




ACCEPTED ANSWER

Score 72


I added a ticket to add an option of some sort here:

https://github.com/pydata/pandas/issues/1450

In the meantime, result.fillna('') should do what you want

EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values, empty strings will stay empty strings in the result




ANSWER 3

Score 18


We have a simple argument in Pandas read_csv() for this:

Use:

df = pd.read_csv('test.csv', na_filter= False)



ANSWER 4

Score 12


What pandas defines by default as missing value while read_csv() can be found here.

import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
print(default_missing)

The output

{'', '<NA>', 'nan', '1.#QNAN', 'NA', 'null', 'n/a', '-nan', '1.#IND', '#N/A N/A', 'N/A', 'NULL', 'NaN', '-1.#IND', '-1.#QNAN', '#NA', '#N/A', '-NaN'}

With that you can do an opt-out.

import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
default_missing = default_missing.remove('')
default_missing = default_missing.remove('na')

with open('test.csv', 'r') as csv_file:
    pandas.read_csv(csv_file, na_values=default_missing)