The Python Oracle

Pandas isin() function for continuous intervals

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Hypnotic Orient Looping

--

Chapters
00:00 Question
00:30 Accepted answer (Score 8)
01:01 Answer 2 (Score 1)
01:44 Answer 3 (Score 1)
01:57 Thank you

--

Full question
https://stackoverflow.com/questions/3062...

Accepted answer links:
[between]: http://pandas.pydata.org/pandas-docs/sta...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas

#avk47



ACCEPTED ANSWER

Score 10


Series objects (including dataframe columns) have a between method:

>>> s = pd.Series(np.linspace(0, 20, 8))
>>> s
0     0.000000
1     2.857143
2     5.714286
3     8.571429
4    11.428571
5    14.285714
6    17.142857
7    20.000000
dtype: float64
>>> s.between(1, 14.5)
0    False
1     True
2     True
3     True
4     True
5     True
6    False
7    False
dtype: bool



ANSWER 2

Score 1


This works:

df['numdum'] = (df.number >= 1) & (df.number <= 10)



ANSWER 3

Score 1


You could also do the same thing with cut(). No real advantage if there are just two categories:

>>> df['numdum'] = pd.cut( df['number'], [-99,10,99], labels=[1,0] )

   number numdum
0       8      1
1       9      1
2      10      1
3      11      0
4      12      0
5      13      0
6      14      0

But it's nice if you have multiple categories:

>>> df['numdum'] = pd.cut( df['number'], [-99,8,10,99], labels=[1,2,3] )

   number numdum
0       8      1
1       9      2
2      10      2
3      11      3
4      12      3
5      13      3
6      14      3

Labels can be True and False if that is preferred, or you can not specify the label at all, in which case the labels will contain info on the cutoff points.