Efficiently read big csv file by parts using Dask
--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Realization
--
Chapters
00:00 Efficiently Read Big Csv File By Parts Using Dask
01:02 Accepted Answer Score 4
01:24 Thank you
--
Full question
https://stackoverflow.com/questions/6073...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #csv #dask #daskdataframe
#avk47
    Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Realization
--
Chapters
00:00 Efficiently Read Big Csv File By Parts Using Dask
01:02 Accepted Answer Score 4
01:24 Thank you
--
Full question
https://stackoverflow.com/questions/6073...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#python #csv #dask #daskdataframe
#avk47
ACCEPTED ANSWER
Score 4
Dask dataframe will partition the data for you, you don't need to use nrows/skip_rows
df = dd.read_csv(filename)
If you want to pick out a particular partition then you could use the partitions accessor
part = df.partitions[i]
However, you might also want to apply your functions in parallel.
df.map_partitions(process).to_csv("data.*.csv")