The Python Oracle

Boolean logic in Pandas is returning "KeyError: True"

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Quiet Intelligence

--

Chapters
00:00 Boolean Logic In Pandas Is Returning &Quot;Keyerror: True&Quot;
00:41 Answer 1 Score 4
01:35 Accepted Answer Score 2
01:55 Thank you

--

Full question
https://stackoverflow.com/questions/4881...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #python27 #pandas

#avk47



ANSWER 1

Score 4


There's a lot wrong with:

str(data2['RA'])[0:5] is not str(126.1)

To begin with, is not will evaluate to True or False, but you are trying to create a boolean array for selection, so right off the bat this is misguided. Second, you should never use is to compare str object to begin with. For these sorts of string manipulations on pandas.Series objects, there are built-in vectorized methods accessible through .str which mimic the built-in string methods. So given:

>>> df
          File      Date          Time          RA        Dec
0  ad0147.fits  18-02-13  22:26:01.779  126.109510  27.360011
1  ad0147.fits  18-02-13  22:26:01.779  126.061077  27.361124
2  ad0147.fits  18-02-13  22:26:01.779  125.994430  27.363504
>>> df.dtypes
File     object
Date     object
Time     object
RA      float64
Dec     float64
dtype: object

You could use:

>>> df.RA.astype(str).str.startswith('126.1')
0     True
1    False
2    False
Name: RA, dtype: bool

And simply combine that with boolean-indexing:

>>> df[df.RA.astype(str).str.startswith('126.1')]
          File      Date          Time         RA        Dec
0  ad0147.fits  18-02-13  22:26:01.779  126.10951  27.360011



ACCEPTED ANSWER

Score 2


Take a look at the .str method which is available on any Pandas Series (which is what the columns of the data frame are). It supports regular expression syntax. I often search for what I don't want and then negate it with the ~. Like this:

df = df[~df.RA.str.contains('126.1')]