Find element's index in pandas Series
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Fantascape Looping
--
Chapters
00:00 Find Element'S Index In Pandas Series
00:32 Accepted Answer Score 296
00:48 Answer 2 Score 68
01:36 Answer 3 Score 28
03:41 Answer 4 Score 15
04:07 Answer 5 Score 12
04:31 Thank you
--
Full question
https://stackoverflow.com/questions/1832...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #pandas
#avk47
ACCEPTED ANSWER
Score 296
>>> myseries[myseries == 7]
3 7
dtype: int64
>>> myseries[myseries == 7].index[0]
3
Though I admit that there should be a better way to do that, but this at least avoids iterating and looping through the object and moves it to the C level.
ANSWER 2
Score 69
Converting to an Index, you can use get_loc
In [1]: myseries = pd.Series([1,4,0,7,5], index=[0,1,2,3,4])
In [3]: Index(myseries).get_loc(7)
Out[3]: 3
In [4]: Index(myseries).get_loc(10)
KeyError: 10
Duplicate handling
In [5]: Index([1,1,2,2,3,4]).get_loc(2)
Out[5]: slice(2, 4, None)
Will return a boolean array if non-contiguous returns
In [6]: Index([1,1,2,1,3,2,4]).get_loc(2)
Out[6]: array([False, False, True, False, False, True, False], dtype=bool)
Uses a hashtable internally, so fast
In [7]: s = Series(randint(0,10,10000))
In [9]: %timeit s[s == 5]
1000 loops, best of 3: 203 µs per loop
In [12]: i = Index(s)
In [13]: %timeit i.get_loc(5)
1000 loops, best of 3: 226 µs per loop
As Viktor points out, there is a one-time creation overhead to creating an index (its incurred when you actually DO something with the index, e.g. the is_unique)
In [2]: s = Series(randint(0,10,10000))
In [3]: %timeit Index(s)
100000 loops, best of 3: 9.6 µs per loop
In [4]: %timeit Index(s).is_unique
10000 loops, best of 3: 140 µs per loop
ANSWER 3
Score 15
In [92]: (myseries==7).argmax()
Out[92]: 3
This works if you know 7 is there in advance. You can check this with (myseries==7).any()
Another approach (very similar to the first answer) that also accounts for multiple 7's (or none) is
In [122]: myseries = pd.Series([1,7,0,7,5], index=['a','b','c','d','e'])
In [123]: list(myseries[myseries==7].index)
Out[123]: ['b', 'd']
ANSWER 4
Score 12
Another way to do this, although equally unsatisfying is:
s = pd.Series([1,3,0,7,5],index=[0,1,2,3,4])
list(s).index(7)
returns: 3
On time tests using a current dataset I'm working with (consider it random):
[64]: %timeit pd.Index(article_reference_df.asset_id).get_loc('100000003003614')
10000 loops, best of 3: 60.1 µs per loop
In [66]: %timeit article_reference_df.asset_id[article_reference_df.asset_id == '100000003003614'].index[0]
1000 loops, best of 3: 255 µs per loop
In [65]: %timeit list(article_reference_df.asset_id).index('100000003003614')
100000 loops, best of 3: 14.5 µs per loop