The Python Oracle

NaN is not recognized in pandas after np.where clause. Why? Or is it a bug?

--------------------------------------------------
Hire the world's top talent on demand or became one of them at Toptal: https://topt.al/25cXVn
and get $2,000 discount on your first invoice
--------------------------------------------------


Take control of your privacy with Proton's trusted, Swiss-based, secure services.
Choose what you need and safeguard your digital life:
Mail: https://go.getproton.me/SH1CU
VPN: https://go.getproton.me/SH1DI
Password Manager: https://go.getproton.me/SH1DJ
Drive: https://go.getproton.me/SH1CT


Music by Eric Matyas
https://www.soundimage.org
Track title: Underwater World

--

Chapters
00:00 Nan Is Not Recognized In Pandas After Np.Where Clause. Why? Or Is It A Bug?
00:31 Accepted Answer Score 14
01:16 Thank you

--

Full question
https://stackoverflow.com/questions/3475...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #numpy #pandas

#avk47



ACCEPTED ANSWER

Score 14


You can see why if you look at the result of the where:

>>> np.where(a.isnull(), np.nan, "Hello")
array([u'Hello', u'nan'], 
      dtype='<U32')

Because your other value is a string, where converts your NaN to a string as well and gives you a string-dtyped result. (The exact dtype you get may different depending on your platform and/or Python version.) So you don't actually have a NaN in your result at all, you just have the string "nan".

If you want to do this type of mapping (in particular, mapping that changes dtypes) in pandas, it's usually better to use pandas constructs like .map and avoid dropping into numpy, because as you saw, numpy tends to do unhelpful things when it has to resolve conflicting types. Here's an example of how to do it all in pandas:

>>> b["X"] = a.isnull().map({True: np.nan, False: "Hello"})
>>> b
   0      X
0  a  Hello
1  b    NaN
>>> b.X.isnull()
0    False
1     True
Name: X, dtype: bool