The Python Oracle

Comparing pd.Series and getting, what appears to be, unusual results when the series contains None

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Breezy Bay

--

Chapters
00:00 Question
00:57 Accepted answer (Score 3)
01:48 Thank you

--

Full question
https://stackoverflow.com/questions/5354...

Accepted answer links:
[by design]: https://github.com/pandas-dev/pandas/iss...
http://pandas.pydata.org/pandas-docs/sta...
[pd.Series.equals]: https://pandas.pydata.org/pandas-docs/st...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python3x #pandas

#avk47



ACCEPTED ANSWER

Score 3


This is by design:

see the warnings box: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

This was done quite a while ago to make the behavior of nulls consistent, in that they don't compare equal. This puts None and np.nan on an equal (though not-consistent with python, BUT consistent with numpy) footing.

So this is not a bug, rather a consequence of stradling 2 conventions.

I suppose the documentation could be slightly enhanced.

For equality of series containing null values, use pd.Series.equals:

pd.Series(['x', 'y', None]).equals(pd.Series(['x', 'y', None]))  # True