Skip to content

Instantly share code, notes, and snippets.

@lh3
Last active December 30, 2017 22:22
Show Gist options
  • Save lh3/acf621374581be03f0de654c45cf5ca8 to your computer and use it in GitHub Desktop.
Save lh3/acf621374581be03f0de654c45cf5ca8 to your computer and use it in GitHub Desktop.
Downloading gzip'd fastq
Source Dst. file type Protocol Time (s) Command Line
NCBI .sra ftp 296 wget
NCBI .fastq.gz sra toolkit ~23000 fastq-dump -Z --gzip --split-spot
local file sra=>fastq.gz sra toolkit ~15000 fastq-dump --gzip --split-spot --split-3
EBI .fastq.gz aspera 513+492 aspera -QT -l 300m
EBI .fastq.gz ftp 1876+1946 wget

Notes:

  • Destination: a super computer at Broad Institute (yes, the connection is that fast – 22GB in 5 minutes via ftp).

  • 513+492 means downloading read1 took 513 wall-clock seconds and read2 492 seconds. The two files were downloaded in parallel.

  • Time on downloading and file conversion with SRA toolkit is estimated based on partially downloaded/converted file sizes. Inaccurate.

  • SRA toolkit may be spending significant time on gzip compression. It is probably faster to convert to plain FASTQs first and then run gzip afterwards.

URLs:

# SRA ftp, SRA format
ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR208/SRR2088062/SRR2088062.sra
# ENA ftp, gzip'd FASTQ format
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_2.fastq.gz
# ENA aspera, gzip'd FASTQ format
[email protected]:/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_1.fastq.gz
[email protected]:/vol1/fastq/SRR208/002/SRR2088062/SRR2088062_2.fastq.gz
@lh3
Copy link
Author

lh3 commented May 1, 2016

Thanks a lot, @rchikhi. Just saw your comment. I was using --gzip because most of time we would not want to keep plain fastq. On an additional note, in my table, running fastq-dump on remote SRA accessions seems much slower than wget download + local fastq-dump. Have you observed this? If this is true, probably NCBI should not hide the FTP download links to SRA files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment