The Python Oracle

How to skip the headers when processing a csv file using Python?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: RPG Blues Looping

--

Chapters
00:00 How To Skip The Headers When Processing A Csv File Using Python?
00:53 Answer 1 Score 16
01:04 Accepted Answer Score 538
01:47 Answer 3 Score 180
02:12 Answer 4 Score 3
02:33 Thank you

--

Full question
https://stackoverflow.com/questions/1425...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #csv #csvheader

#avk47



ACCEPTED ANSWER

Score 538


Your reader variable is an iterable, by looping over it you retrieve the rows.

To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.

You can also simplify your code a little; use the opened files as context managers to have them closed automatically:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
   reader = csv.reader(infile)
   next(reader, None)  # skip the headers
   writer = csv.writer(outfile)
   for row in reader:
       # process each row
       writer.writerow(row)

# no need to close, the files are closed automatically when you get to this point.

If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of next() to writer.writerow():

headers = next(reader, None)  # returns the headers or `None` if the input is empty
if headers:
    writer.writerow(headers)



ANSWER 2

Score 180


Another way of solving this is to use the DictReader class, which "skips" the header row and uses it to allowed named indexing.

Given "foo.csv" as follows:

FirstColumn,SecondColumn
asdf,1234
qwer,5678

Use DictReader like this:

import csv
with open('foo.csv') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print(row['FirstColumn'])  # Access by column header instead of column number
        print(row['SecondColumn'])



ANSWER 3

Score 16


Doing row=1 won't change anything, because you'll just overwrite that with the results of the loop.

You want to do next(reader) to skip one row.




ANSWER 4

Score 3


Inspired by Martijn Pieters' response.

In case you only need to delete the header from the csv file, you can work more efficiently if you write using the standard Python file I/O library, avoiding writing with the CSV Python library:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
   next(infile)  # skip the headers
   outfile.write(infile.read())