Skip to content

Instantly share code, notes, and snippets.

@mikelove
Created June 16, 2016 18:33
Show Gist options
  • Save mikelove/f539631f9e187a8931d34779436a1c01 to your computer and use it in GitHub Desktop.
Save mikelove/f539631f9e187a8931d34779436a1c01 to your computer and use it in GitHub Desktop.
ENA accession to URL
accession2url <- function(x) {
prefix <- "ftp://ftp.sra.ebi.ac.uk/vol1/fastq"
dir1 <- paste0("/",substr(x,1,6))
dir2 <- ifelse(nchar(x) == 9, "",
ifelse(nchar(x) == 10, paste0("/00",substr(x,10,10)),
ifelse(nchar(x) == 11, paste0("/0",substr(x,10,11)),
paste0("/",substr(x,10,12)))))
paste0(prefix,dir1,dir2,"/",x)
}
@mikelove
Copy link
Author

An R implementation of the rule:

Archive generated fastq files are organised by run accession number under vol1/fastq directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/<dir1>[/<dir2>]/<run accession>

<dir1> is the first 6 letters and numbers of the run accession ( e.g. ERR000 for ERR000916 ),

<dir2> does not exist if the run accession has six digits. 

For example, fastq files for run ERR000916 are in directory: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR000/ERR000916/.

If the run accession has seven digits then the <dir2> is 00 + the last digit of the run accession. 

For example, fastq files for run SRR1016916 are in directory: ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR101/006/SRR1016916/.

If the run accession has eight digits then the <dir2> is 0 + the last two digits of the run accession. 

If the run accession has nine digits then the <dir2> is the last three digits of the run accession. 

http://www.ebi.ac.uk/ena/browse/read-download#downloading_files_ena_browser

@Benjamin-Lee
Copy link

Thanks for this! It was super useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment