The Python Oracle

Boolean logic in Pandas is returning "KeyError: True"

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Dreamlands

--

Chapters
00:00 Question
00:53 Accepted answer (Score 2)
01:20 Answer 2 (Score 3)
02:29 Thank you

--

Full question
https://stackoverflow.com/questions/4881...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python27 #pandas

#avk47



ANSWER 1

Score 4


There's a lot wrong with:

str(data2['RA'])[0:5] is not str(126.1)

To begin with, is not will evaluate to True or False, but you are trying to create a boolean array for selection, so right off the bat this is misguided. Second, you should never use is to compare str object to begin with. For these sorts of string manipulations on pandas.Series objects, there are built-in vectorized methods accessible through .str which mimic the built-in string methods. So given:

>>> df
          File      Date          Time          RA        Dec
0  ad0147.fits  18-02-13  22:26:01.779  126.109510  27.360011
1  ad0147.fits  18-02-13  22:26:01.779  126.061077  27.361124
2  ad0147.fits  18-02-13  22:26:01.779  125.994430  27.363504
>>> df.dtypes
File     object
Date     object
Time     object
RA      float64
Dec     float64
dtype: object

You could use:

>>> df.RA.astype(str).str.startswith('126.1')
0     True
1    False
2    False
Name: RA, dtype: bool

And simply combine that with boolean-indexing:

>>> df[df.RA.astype(str).str.startswith('126.1')]
          File      Date          Time         RA        Dec
0  ad0147.fits  18-02-13  22:26:01.779  126.10951  27.360011



ACCEPTED ANSWER

Score 2


Take a look at the .str method which is available on any Pandas Series (which is what the columns of the data frame are). It supports regular expression syntax. I often search for what I don't want and then negate it with the ~. Like this:

df = df[~df.RA.str.contains('126.1')]