Source | Dst. file type | Protocol | Time (s) | Command Line |
---|---|---|---|---|
NCBI | .sra | ftp | 296 | wget |
NCBI | .fastq.gz | sra toolkit | ~23000 | fastq-dump -Z --gzip --split-spot |
local file | sra=>fastq.gz | sra toolkit | ~15000 | fastq-dump --gzip --split-spot --split-3 |
EBI | .fastq.gz | aspera | 513+492 | aspera -QT -l 300m |
EBI | .fastq.gz | ftp | 1876+1946 | wget |
Notes:
-
Destination: a super computer at Broad Institute (yes, the connection is that fast – 22GB in 5 minutes via ftp).
-
513+492 means downloading read1 took 513 wall-clock seconds and read2 492 seconds. The two files were downloaded in parallel.
-
Time on downloading and file conversion with SRA toolkit is estimated based on partially downloaded/converted file sizes. Inaccurate.
-
SRA toolkit may be spending significant time on gzip compression. It is probably faster to convert to plain FASTQs first and then run gzip afterwards.
URLs:
# SRA ftp, SRA format
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR208/SRR2088062/SRR2088062.sra
# ENA ftp, gzip'd FASTQ format
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_2.fastq.gz
# ENA aspera, gzip'd FASTQ format
[email protected]:/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_1.fastq.gz
[email protected]:/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_2.fastq.gz
There are significant differences in the performance of
fastq-dump
, whether you ask for gzip output (3-4x slower), write to a file with--split-3
instead of stdout (1.5x slower).Tested on a local file, SRR1028232.sra (1 GB), with
sratoolkit
version 2.5.7 centos binaries. On a server in Pennsylvania. Time to download the SRA file usingprefetch
: 90 seconds.In the post above, local .sra=>.fastq.gz is 50x slower than downloading the .sra. In this experiment here, it is 13x slower. Still, I think fastq-dump would need to be faster.