Source | Dst. file type | Protocol | Time (s) | Command Line |
---|---|---|---|---|
NCBI | .sra | ftp | 296 | wget |
NCBI | .fastq.gz | sra toolkit | ~23000 | fastq-dump -Z --gzip --split-spot |
local file | sra=>fastq.gz | sra toolkit | ~15000 | fastq-dump --gzip --split-spot --split-3 |
EBI | .fastq.gz | aspera | 513+492 | aspera -QT -l 300m |
EBI | .fastq.gz | ftp | 1876+1946 | wget |
Notes:
-
Destination: a super computer at Broad Institute (yes, the connection is that fast – 22GB in 5 minutes via ftp).
-
513+492 means downloading read1 took 513 wall-clock seconds and read2 492 seconds. The two files were downloaded in parallel.
-
Time on downloading and file conversion with SRA toolkit is estimated based on partially downloaded/converted file sizes. Inaccurate.
-
SRA toolkit may be spending significant time on gzip compression. It is probably faster to convert to plain FASTQs first and then run gzip afterwards.
URLs:
# SRA ftp, SRA format
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR208/SRR2088062/SRR2088062.sra
# ENA ftp, gzip'd FASTQ format
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_2.fastq.gz
# ENA aspera, gzip'd FASTQ format
[email protected]:/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_1.fastq.gz
[email protected]:/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_2.fastq.gz
Thanks a lot, @rchikhi. Just saw your comment. I was using
--gzip
because most of time we would not want to keep plain fastq. On an additional note, in my table, running fastq-dump on remote SRA accessions seems much slower than wget download + local fastq-dump. Have you observed this? If this is true, probably NCBI should not hide the FTP download links to SRA files.