The Python Oracle

How to reset index in a pandas dataframe?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Over Ancient Waters Looping

--

Chapters
00:00 How To Reset Index In A Pandas Dataframe?
00:32 Accepted Answer Score 1139
00:47 Answer 2 Score 73
01:14 Answer 3 Score 24
01:21 Answer 4 Score 2
02:31 Thank you

--

Full question
https://stackoverflow.com/questions/2049...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #indexing #pandas #dataframe

#avk47



ACCEPTED ANSWER

Score 1139


DataFrame.reset_index is what you're looking for. If you don't want it saved as a column, then do:

df = df.reset_index(drop=True)

If you don't want to reassign:

df.reset_index(drop=True, inplace=True)



ANSWER 2

Score 73


Another solutions are assign RangeIndex or range:

df.index = pd.RangeIndex(len(df.index))

df.index = range(len(df.index))

It is faster:

df = pd.DataFrame({'a':[8,7], 'c':[2,4]}, index=[7,8])
df = pd.concat([df]*10000)
print (df.head())

In [298]: %timeit df1 = df.reset_index(drop=True)
The slowest run took 7.26 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 105 µs per loop

In [299]: %timeit df.index = pd.RangeIndex(len(df.index))
The slowest run took 15.05 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.84 µs per loop

In [300]: %timeit df.index = range(len(df.index))
The slowest run took 7.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.2 µs per loop



ANSWER 3

Score 24


data1.reset_index(inplace=True)



ANSWER 4

Score 2


df.reset_index(drop=True) effectively replaces the index by the default RangeIndex. Another way to do the same thing is to straight away assign a new index using set_axis() (which I believe is what OP attempted with reindex). So the following two return the same output:

df1 = df.set_axis(range(len(df)))

df2 = df.reset_index(drop=True)

Note that most method/functions in pandas that remove/modify rows such as drop_duplicates(), sort_values(), dropna(), pd.concat() etc. have ignore_index parameter, which when passed True resets the index into a RangeIndex in a single function call. So keep an eye out for this parameter if you were removing/adding rows to a dataframe. An example:

df.dropna().reset_index(drop=True)    # <--- instead of this

df.dropna(ignore_index=True)          # <--- use this

In this way, you can use inplace parameter as well.

df1 = df.dropna().reset_index(drop=True)     # <--- must assign to dataframe
df.dropna(ignore_index=True, inplace=True)   # <--- `df` modified in-place

If you used groupby and want to replace the index into the default RangeIndex, there is the as_index parameter when passed False resets the index into RangeIndex in the same function call. So instead of df.groupby('col1').mean().reset_index(), use df.groupby('col1', as_index=False).mean().