The Python Oracle

How to create full compressed tar file using Python?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzling Curiosities

--

Chapters
00:00 Question
00:21 Accepted answer (Score 292)
00:49 Answer 2 (Score 112)
01:13 Answer 3 (Score 35)
01:48 Answer 4 (Score 23)
02:29 Thank you

--

Full question
https://stackoverflow.com/questions/2032...

Answer 2 links:
[tarfile.open]: http://docs.python.org/library/tarfile.h...

Answer 3 links:
[This question]: https://stackoverflow.com/questions/4562...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #compression #zip #tarfile

#avk47



ACCEPTED ANSWER

Score 317


To build a .tar.gz (aka .tgz) for an entire directory tree:

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir.




ANSWER 2

Score 114


import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".




ANSWER 3

Score 35


You call tarfile.open with mode='w:gz', meaning "Open for gzip compressed writing."

You'll probably want to end the filename (the name argument to open) with .tar.gz, but that doesn't affect compression abilities.

BTW, you usually get better compression with a mode of 'w:bz2', just like tar can usually compress even better with bzip2 than it can compress with gzip.




ANSWER 4

Score 26


Previous answers advise using the tarfile Python module for creating a .tar.gz file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile is approximately two times slower than the tar utility in Linux. According to my experience this estimation is pretty correct.

So for faster archiving you can use the tar command using subprocess module:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])