Skip to content

Instantly share code, notes, and snippets.

@klmr
Last active December 25, 2015 02:29
Show Gist options
  • Save klmr/6903089 to your computer and use it in GitHub Desktop.
Save klmr/6903089 to your computer and use it in GitHub Desktop.
Downloading and storing on disk a Fasta record from an archived Ensembl BioMart
xsource(rcane.functional) # for `%|%`
# !!! IMPORTANT !!!
# The Ensembl Biomart archive server is extremely slow. Therefore this code is
# FOR EXPOSITION ONLY. Use the transcripts file shipped with the data for this
# code instead.
downloadTranscripts <- function (target) {
require(biomaRt)
ensMart <- useMart('ENSEMBL_MART_ENSEMBL',
'mmusculus_gene_ensembl',
'may2012.archive.ensembl.org')
attributes <- c('ensembl_gene_id', 'ensembl_transcript_id', 'coding')
transcripts <- getBM(attributes, mart = ensMart)
fasta <- mapply(c,
do.call(paste, c(as.list(transcripts[, 1 : 2]), sep = '|')),
splitLines(transcripts$coding, 60),
USE.NAMES = FALSE, SIMPLIFY = FALSE) %|% unlist
writeLines(fasta, target)
}
splitLines <- function (str, lineLength, collapse = '\n')
lapply(regmatches(str, gregexpr(sprintf('.{0,%d}', lineLength),
str, perl = TRUE)),
paste, collapse = '\n') %|% unlist
downloadTranscripts('results/Mus_musculus.NCBIM37.67.transcripts.fa')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment