Python Pandas: How to read only first n rows of CSV files in?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Dreaming in Puzzles

--

Chapters
00:00 Python Pandas: How To Read Only First N Rows Of Csv Files In?
00:17 Accepted Answer Score 353
01:04 Answer 2 Score 0
01:27 Answer 3 Score 3
01:57 Thank you

--

Full question
https://stackoverflow.com/questions/2385...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pandas #csv #fileio

#avk47

ACCEPTED ANSWER

Score 353

If you only want to read the first 999,999 (non-header) rows:

read_csv(..., nrows=999999)

If you only want to read rows 1,000,000 ... 1,999,999

read_csv(..., skiprows=1000000, nrows=999999)

nrows : int, default None Number of rows of file to read. Useful for reading pieces of large files*

skiprows : list-like or integer Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file

and for large files, you'll probably also want to use chunksize:

chunksize : int, default None Return TextFileReader object for iteration

pandas.io.parsers.read_csv documentation

ANSWER 2

Score 3

chunksize= is a very useful argument because the output of read_csv after passing it is an iterator, so you can call the next() function on it to get the specific chunk you want without straining your memory. For example, to get the first n rows, you can use:

chunks = pd.read_csv('file.csv', chunksize=n)
df = next(chunks)

For example, if you have a time-series data and you want to make the first 700k rows the train set and the remainder test set, then you can do so by:

chunks = pd.read_csv('file.csv', chunksize=700_000)
train_df = next(chunks)
test_df = next(chunks)

ANSWER 3

Score 0

If you do not want to use Pandas, you can use csv library and to limit row readed with interaction break.

For example, I needed to read a list of files stored in csvs list to get the only the header.

for csvs in result:
    csvs = './'+csvs
    with open(csvs,encoding='ANSI', newline='') as csv_file:
        csv_reader = csv.reader(csv_file, delimiter=',')
        count=0
        for row in csv_reader:
            if count:
                break;