Skip to content

Instantly share code, notes, and snippets.

@blahah
Created December 15, 2015 16:28
Show Gist options
  • Save blahah/e372fcb7b3988314d2f2 to your computer and use it in GitHub Desktop.
Save blahah/e372fcb7b3988314d2f2 to your computer and use it in GitHub Desktop.
get RefSeq mRNA mapping to human gene symbol via ensembl biomart
# This script demonstrates how to download a mapping from RefSeq mRNA to human gene symbols by using the ensembl biomart service and the bioconductor `biomaRt` package in R.
source("https://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library("biomaRt")
# work around bug in resolving host (https://support.bioconductor.org/p/74304/)
listMarts(host="www.ensembl.org")
# set the mart to use
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", host="www.ensembl.org")
# find the dataset we want
datasets <- listDatasets(ensembl)
hg_row <- which(grepl('sapiens', datasets$dataset))
dataset <- as.character(datasets$dataset[hg_row])
# select the dataset as our source
ensembl <- useMart("ENSEMBL_MART_ENSEMBL", host="www.ensembl.org", dataset=dataset)
# extract the data
mapping <- getBM(attributes=c('refseq_mrna', 'external_gene_name'), mart=ensembl)
# clean it up (only rows with a RefSeq mrna id)
clean_mapping <- mapping[mapping$refseq != "",]
# size of the cleaned dataset
dim(clean_mapping)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment