The Python Oracle

Pandas isin() function for continuous intervals

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Underwater World

--

Chapters
00:00 Pandas Isin() Function For Continuous Intervals
00:28 Accepted Answer Score 10
00:55 Answer 2 Score 1
01:08 Answer 3 Score 1
01:45 Thank you

--

Full question
https://stackoverflow.com/questions/3062...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 10


Series objects (including dataframe columns) have a between method:

>>> s = pd.Series(np.linspace(0, 20, 8))
>>> s
0     0.000000
1     2.857143
2     5.714286
3     8.571429
4    11.428571
5    14.285714
6    17.142857
7    20.000000
dtype: float64
>>> s.between(1, 14.5)
0    False
1     True
2     True
3     True
4     True
5     True
6    False
7    False
dtype: bool



ANSWER 2

Score 1


This works:

df['numdum'] = (df.number >= 1) & (df.number <= 10)



ANSWER 3

Score 1


You could also do the same thing with cut(). No real advantage if there are just two categories:

>>> df['numdum'] = pd.cut( df['number'], [-99,10,99], labels=[1,0] )

   number numdum
0       8      1
1       9      1
2      10      1
3      11      0
4      12      0
5      13      0
6      14      0

But it's nice if you have multiple categories:

>>> df['numdum'] = pd.cut( df['number'], [-99,8,10,99], labels=[1,2,3] )

   number numdum
0       8      1
1       9      2
2      10      2
3      11      3
4      12      3
5      13      3
6      14      3

Labels can be True and False if that is preferred, or you can not specify the label at all, in which case the labels will contain info on the cutoff points.