A short experiment on copying big files
File size is 1412908312
According to https://www.kernel.org/doc/Documentation/sysctl/vm.txt
/bin/echo 3 | sudo tee /proc/sys/vm/drop_caches
is used to remove file data and metadata from (kernel) RAM. We'll use it so we can get timings for copying a file cold, in order to try and get a grip on the disk speed; which is what we want to estimate bulk transfers.
Times are median of 3, using time
on my laptop right now.
laptop, disk to same-disk copy (SSD); 7.26 seconds
laptop, same copy, but after previous without clearing RAM; 3.87 seconds
scp to iceberg /home; 24.57 s; [1]
network to iceberg; 12.47 seconds
scp to iceberg /data; 24.91 s; [1]
scp to iceberg /fastdata; 12.56; [2]
iceberg /data to s3; 18.78; [3]
iceberg /fastdata to s3; 19.11; [4]
iceberg scratch to scratch; 14.50; [5]
iceberg scratch to s3; 17.32;
iceberg /home sha512sum hot; 7.32; [6]
iceberg /home sha512sum cold; 12.7;
iceberg /data sha512sum hot; 7.32; [6]
iceberg /data sha512sum cold; 12.7;
iceberg /shared sha512sum cold; 12.9;
iceberg /fastdata sha512sum cold; 7.6 [7]
iceberg /shared to /home; 24.4
iceberg /fastdata to /home; 15.0
Running sha512sum gigfile
from a series of different nodes using job array to get cold times: the median is 23.5 seconds.
Commands
time cp gigfile copy
dd if=gigfile | time ssh iceberg sha512sum
time scp gigfile iceberg:.
time scp gigfile iceberg:/data/md1xdrj
time scp gigfile iceberg:/fastdata/md1xdrj
time aws s3 cp ~/gigfile s3://drj-test
scp tests are done with a hot file because we're trying to measure network speed, not local disk speed.
[1]: scp reports a transfer time of 12 seconds and a speed of 112.3 MB/s. But the scp command itself takes a further 12 seconds to complete. Is it somehow waiting for final disk write? (and how?)
[2]: close to network speed
[3]: was faster when hot, suggesting limited partly by disk speed.
[4]: slower than /data, which suggests both might be a bit disk bound
[5]: first one can be a lot faster, suggesting RAM copy?
[6] cold is a lot slower, 16s.
[7] close to hot speeds, possibly CPU bound?