The Python Oracle

Comparing pd.Series and getting, what appears to be, unusual results when the series contains None

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence

--

Chapters
00:00 Comparing Pd.Series And Getting, What Appears To Be, Unusual Results When The Series Contains None
00:43 Accepted Answer Score 3
01:20 Thank you

--

Full question
https://stackoverflow.com/questions/5354...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python3x #pandas

#avk47



ACCEPTED ANSWER

Score 3


This is by design:

see the warnings box: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

This was done quite a while ago to make the behavior of nulls consistent, in that they don't compare equal. This puts None and np.nan on an equal (though not-consistent with python, BUT consistent with numpy) footing.

So this is not a bug, rather a consequence of stradling 2 conventions.

I suppose the documentation could be slightly enhanced.

For equality of series containing null values, use pd.Series.equals:

pd.Series(['x', 'y', None]).equals(pd.Series(['x', 'y', None]))  # True