The Python Oracle

Pandas dataframe indexing by date

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Ominous Technology Looping

--

Chapters
00:00 Pandas Dataframe Indexing By Date
02:22 Answer 1 Score 2
02:40 Accepted Answer Score 3
03:59 Thank you

--

Full question
https://stackoverflow.com/questions/1386...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #timeseries

#avk47



ACCEPTED ANSWER

Score 3


I think your confusion is due to a misunderstanding about the index_col argument. When you pass a list of columns to index_col, pandas is attempting to create a multi-index, that is, a dataframe with more than one column as index, like a multi-dimensional table. It is NOT trying to create a single index by concatenating multiple columns.

One strategy that would work is to create three dataframes with the appropriate pairs of columns from your input file, and then concatenate them.

X1 Y1 X2 Y2 X3 Y3 --> Dataframe of (X1, Y1) + Dataframe of (X2, Y2) + Dataframe of (X3, Y3)

If you are using the latest development version of Pandas, or are willing to, this is simplified by using the new parse_cols argument in read_csv(). Or you can read in all the data, extract the three dataframes you need, and then concatenate them.

Finally, you can df.truncate with before and after arguments to get the DateRange you need. More simply, you could use dropna() to omit dates with missing values.

Hope this helps. Do let us know what version of pandas you are using.




ANSWER 2

Score 2


by setting index_col=[0,2,4] you are creating a MultiIndex that's why you get that output.

For the output you want read_csv will not be able to do this on the fly. Just read single and merge the dataframes