Original idea from Transfer files from an FTP server to S3 by "Hack N Cheese".
I moved roughly a terrabyte in less than an hour. Granted, I couldn't take advantage of lftp
's --parallel=30
switch due to my ftp source limiting me to one connection at a time, but use-pget-n=N
did seem to help out.
-
Get a fast Ubuntu 14.4 EC2 box on Amazon for temporary usage (I went with
m1.xlarge
) so data tranfers aren't limited by your local bandwidth at least. I also attached a fat 2TB EBS volume and symlinked it to/bigdisk
, and made sure the EBS volume was deleted after I terminated this EC2 box. I hopelftp
2.6.4 is available as a stable package by the next time I attempt this. -
Build
lftp
2.6.4+ (Not easy to compile, so read theINSTALL
file and plow through all your missing dependencies - you'll also need to re-runsudo ./configure && sudo make && sudo make install
if you were in my case, without sudo they just won't work). Presently the Ubunutu apt package is atlftp/trusty,now 4.4.13-1 amd64 [residual-config]
, so uninstall this if you had it previously since themirror
options in that version oflftp
are severely limited and not available till at least 2.6.4 version. -
Run all these in mosh and tmux window sessions just incase...
-
Run this on your ec2 box:
lftp -e " \
debug -t 2; \
set net:max-retries 3000; \
set net:timeout 10m; \
set ftp:charset iso-8859-1; \
open ftp.yoursite.com; \
mirror \
--log log.txt \
--use-pget-n=1000 \
--use-cache \
--continue \
--loop \
/your/ftp/remote/path /your/ec2/local/path \
exit; \
"
-
Note:
lftp mirror
command with--parallel=30
is only possible if your FTP server lets you connect 30 simultaneous connections. In my case, I was limited to only 1 connection :(. -
Then
wget
a copy ofs3-parallel-put.py
(my (fork)[https://github.com/weftio/s3-parallel-put] for regionalized buckets if you need to): -
Do the parallel s3 put dance:
/your/ec2/local/path$ python s3-parallel-put --bucket=weft-wind-data --secure --put=update --processes=50 --content-type=guess --verbose --log-filename=/tmp/s3pp.log /your/local/ec2/path
Wow, not so bad. Kinda. Except I had to hack a pull request for s3-parallel-put
to support my bucket which lives outside the US Standard region - you may need this as well.
@sylvainemery h/t to your blog post.