Last active
February 9, 2024 06:23
-
-
Save dch/b5df789dfb7ee1e8b8afa6263605efc7 to your computer and use it in GitHub Desktop.
tarsnap hacky parallel restore script
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/sh | |
# recover all files in parallel from the most recent archive | |
# MIT license | |
# https://git.io/vdrbG | |
# "works on my machine" | |
# lots of assumptions notably path length (strip-component) | |
# get the latest archive as our names can be sorted by time | |
ARCHIVE=`tarsnap --keyfile /tmp/tarsnap.key --list-archives | sort | tail -1` | |
# order the archives by descending size | |
FILES=`tarsnap --keyfile /tmp/tarsnap.key -tvf ${ARCHIVE} | cut -w -f 5,9 | sort -rn | cut -w -f 2` | |
# spawn 10 invocations in parallel (use -P 0 for unlimited) | |
echo $FILES | xargs -P 10 -n 1 -t \ | |
time tarsnap \ | |
--retry-forever \ | |
-S \ | |
--strip-components 6 \ | |
--print-stats \ | |
--humanize-numbers \ | |
--keyfile /tmp/tarsnap.key \ | |
--chroot \ | |
-xv \ | |
-f ${ARCHIVE} | |
# profit |
A very good point - expensive in bandwidth. But will it get overall better throughput during restore? Are there alternative options for better throughput?
If you have a small number of files then it might get more throughput. It would depend on the average size of the files you're downloading vs the overhead of downloading all of the tar headers (512 bytes * number of files). Running 10 processes in parallel I guess it would complete faster if the number of files is less than the average file size divided by 50 bytes?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Leaving a comment here in case anyone finds this and tries to use it: This will use lots and lots of bandwdith if you have many files! It spawns a tarsnap process for each file in the archve (
xargs -n 1
) and each tarsnap process has to read all of the tar headers in the archive to find the right file. So it ends up being O(N^2) in the number of files.