-
-
Save dch/b5df789dfb7ee1e8b8afa6263605efc7 to your computer and use it in GitHub Desktop.
| #!/bin/sh | |
| # recover all files in parallel from the most recent archive | |
| # MIT license | |
| # https://git.io/vdrbG | |
| # "works on my machine" | |
| # lots of assumptions notably path length (strip-component) | |
| # get the latest archive as our names can be sorted by time | |
| ARCHIVE=`tarsnap --keyfile /tmp/tarsnap.key --list-archives | sort | tail -1` | |
| # order the archives by descending size | |
| FILES=`tarsnap --keyfile /tmp/tarsnap.key -tvf ${ARCHIVE} | cut -w -f 5,9 | sort -rn | cut -w -f 2` | |
| # spawn 10 invocations in parallel (use -P 0 for unlimited) | |
| echo $FILES | xargs -P 10 -n 1 -t \ | |
| time tarsnap \ | |
| --retry-forever \ | |
| -S \ | |
| --strip-components 6 \ | |
| --print-stats \ | |
| --humanize-numbers \ | |
| --keyfile /tmp/tarsnap.key \ | |
| --chroot \ | |
| -xv \ | |
| -f ${ARCHIVE} | |
| # profit |
A very good point - expensive in bandwidth. But will it get overall better throughput during restore? Are there alternative options for better throughput?
If you have a small number of files then it might get more throughput. It would depend on the average size of the files you're downloading vs the overhead of downloading all of the tar headers (512 bytes * number of files). Running 10 processes in parallel I guess it would complete faster if the number of files is less than the average file size divided by 50 bytes?
Why all this cut and sort in the FILES line?
I'm currently pretty happy with my simplified version:
mkdir -p ~/tarsnap_restore
tarsnap -tf "${ARCHIVE}" \
| xargs -P 20 -n 20 -t \
| tarsnap \
--humanize-numbers \
--resume-extract \
--chroot \
-xf "${ARCHIVE}" \
-C ~/tarsnap_restore \
--
I omitted -v, so no cut is necessary to cut out the path.
I added --resume-extract to skip already extracted files.
I added -C ... to extract to a specific dir and not to $PWD.
-P 20 -n 20 is reasonable for my case.
Adding " -- " to the end helps with filenames that contain " -" or " --" that might otherwise be interpreted as tarsnap option.
Other options like -S, --retry-forever, --strip-components 6, seem specific to the authors use-case. So I omitted them for mine.
Leaving a comment here in case anyone finds this and tries to use it: This will use lots and lots of bandwdith if you have many files! It spawns a tarsnap process for each file in the archve (
xargs -n 1) and each tarsnap process has to read all of the tar headers in the archive to find the right file. So it ends up being O(N^2) in the number of files.