Last active
August 2, 2018 18:23
-
-
Save bkutlu/26f371ffbe44835abfe3e10c35c308ab to your computer and use it in GitHub Desktop.
Prepare a table annotations for human genes using Bioconductor
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# I am sure this can be done in a multiple of ways: but here's a nice example of use of Reduce for merging lists | |
# This particular use case if when you are trying to make a look up table for Human genes | |
# Check out Marc Carlson's Annotationdbi packages and tutorials for alternative ways (SQL) of retrieving the information | |
# Caution note: | |
# the unintended consequence of joining on the Entrez Gene ids is the Ensembl ids for alternative ids are lost and usually the | |
# alternative ensembl id is used | |
# load package from the BioConductor Project | |
library("org.Hs.eg.db") | |
library("tidyverse") | |
Reduce(function(...) merge(..., by='gene_id', all.x=TRUE), | |
list(toTable(org.Hs.egENSEMBL2EG), | |
toTable(org.Hs.egSYMBOL), | |
toTable(org.Hs.egGENENAME))) %>% | |
as.tibble() | |
# A tibble: 28,964 x 4 | |
gene_id ensembl_id symbol gene_name | |
<chr> <chr> <chr> <chr> | |
1 1 ENSG00000121410 A1BG alpha-1-B glycoprotein | |
2 10 ENSG00000156006 NAT2 N-acetyltransferase 2 | |
3 100 ENSG00000196839 ADA adenosine deaminase | |
4 1000 ENSG00000170558 CDH2 cadherin 2 | |
5 10000 ENSG00000117020 AKT3 AKT serine/threonine kinase 3 | |
6 10000 ENSG00000275199 AKT3 AKT serine/threonine kinase 3 | |
7 100008586 ENSG00000236362 GAGE12F G antigen 12F |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment