Skip to content

Instantly share code, notes, and snippets.

@alexgarel
Last active May 17, 2022 16:15
Show Gist options
  • Save alexgarel/aebcd6df467bd709b9ebc19cb088af83 to your computer and use it in GitHub Desktop.
Save alexgarel/aebcd6df467bd709b9ebc19cb088af83 to your computer and use it in GitHub Desktop.
Copy one tar to another

I had a corrupted tar file (due to interruption while writing last file …). I needed to append to archive and it was not possible (I had tarfile.ReadError: empty header as soon as I opened the archive in append mode. Problem : the archive had 1M°+ files all flat (it was intended to use directly).

This script save my day, and was quite fast (Note how opening the big tar is expensive: 9 minutes !):

2022-05-17T15:52:17.492944 starting
2022-05-17T16:03:01.681669 0 done, 0 errs
2022-05-17T16:05:01.001833 100000 done, 0 errs
2022-05-17T16:06:17.162066 200000 done, 0 errs
2022-05-17T16:07:30.987101 300000 done, 0 errs
2022-05-17T16:08:45.925787 400000 done, 0 errs
2022-05-17T16:10:09.520679 500000 done, 0 errs
2022-05-17T16:11:04.774781 600000 done, 0 errs
2022-05-17T16:11:49.751980 700000 done, 0 errs
2022-05-17T16:12:53.108568 800000 done, 0 errs
2022-05-17T16:14:34.008851 900000 done, 0 errs
2022-05-17T16:15:53.997969 1000000 done, 0 errs
2022-05-17T16:17:10.676214 1100000 done, 0 errs
2022-05-17T16:18:44.278315 1200000 done, 0 errs
import tarfile
from datetime import datetime
OUT_FILE = "logos.tar"
NEW_FILE = "logos2.tar"
print(datetime.now().isoformat(), "starting")
with tarfile.open(OUT_FILE, "r") as old, tarfile.open(NEW_FILE, "w") as new:
errs = 0
for i, tarinfo in enumerate(old.getmembers()):
try:
stream = old.extractfile(tarinfo)
new.addfile(tarinfo, stream)
except Exception as e:
errs += 1
if i % 100000 == 0:
print(datetime.now().isoformat(), "%d done, %d errs" % (i, errs))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment