NaN is not recognized in pandas after np.where clause. Why? Or is it a bug?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Ominous Technology Looping

--

Chapters
00:00 Question
00:44 Accepted answer (Score 12)
01:49 Thank you

--

Full question
https://stackoverflow.com/questions/3475...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #pandas

#avk47

ACCEPTED ANSWER

Score 14

You can see why if you look at the result of the where:

>>> np.where(a.isnull(), np.nan, "Hello")
array([u'Hello', u'nan'], 
      dtype='<U32')

Because your other value is a string, where converts your NaN to a string as well and gives you a string-dtyped result. (The exact dtype you get may different depending on your platform and/or Python version.) So you don't actually have a NaN in your result at all, you just have the string "nan".

If you want to do this type of mapping (in particular, mapping that changes dtypes) in pandas, it's usually better to use pandas constructs like .map and avoid dropping into numpy, because as you saw, numpy tends to do unhelpful things when it has to resolve conflicting types. Here's an example of how to do it all in pandas:

>>> b["X"] = a.isnull().map({True: np.nan, False: "Hello"})
>>> b
   0      X
0  a  Hello
1  b    NaN
>>> b.X.isnull()
0    False
1     True
Name: X, dtype: bool