Skip to content

Instantly share code, notes, and snippets.

@tbnorth
Last active October 17, 2018 18:42
Show Gist options
  • Save tbnorth/8cd18f5d530d4471f651273bc156c4e0 to your computer and use it in GitHub Desktop.
Save tbnorth/8cd18f5d530d4471f651273bc156c4e0 to your computer and use it in GitHub Desktop.
Notes for sending a large amount of data

Say you have a large amount of data you want to send in a folder, say /path/to/large/data. Make a new folder, somewhere it can be accessed by your recipient, by FTP / scp / ssh / HTTP, whatever. E.g. /other/path/to/parts. Then do:

SOURCE="/path/to/large/data"
DEST="/other/path/to/parts"
tar cf - "$SOURCE" | gzip | split -da6 -b10M - "$DEST"/part.
sha1sum "$DEST"/part.* >"$DEST"/checksums.txt

Be careful to omit trailing slashes, as in the first two lines above.

This will compress the data into 10 Mb chunks called part.000000, part.000001, etc. The second line creates a list of checksums so the recipient can check the files transfered ok. Note that the above isn't transferring the files, it's just cutting them up into pieces the recipient can fetch, without relying on an internet connection moving a lot of data in one shot.

You can just email the checksums.txt to the recipient and let them collect all the parts via whatever protocol (FTP / scp / ssh / HTTP) you have in place.

On 20 Gb of numeric text files (~CSV) I got 1.2 Gb of part.xxx files, or nearly 20:1 compression.

To reconstruct the data on the other end, the recipient will just do something like:

cat part.000* | gzip -d | tar vx -C /where/to/put/them -f -
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment