Input data needed;
- A file
orf2go.map
for your study organism. It should contain entries like this;cds.comp1000018_c0_seq1|m.205055 GO:0005618, GO:0016021, GO:0045330, GO:0004857, GO:0030599, GO:0042545 cds.comp1000154_c0_seq1|m.205060 GO:0030136, GO:0031901, GO:0005768
If you don't have an orf2go.map
file but you have a Trinity sqlite
file you can generate one using trinotate_proteomics as follows;
orf_to_go.rb Trinotate.sqlite > orf2go.map
- A file containing a list of IDs for a specific tissue, or (optionally) with quantitative expression info.
library(topGO)
geneID2GO <- readMappings(file = "orf2go.map")
Note that in this example the file orf2go.map
serves 2 purposes. It contains all transcriptID -> GO mappings. But it also contains a complete list of all the transcriptID's. This complete list of transcript ID's serves as the background set for the enrichment analysis. This means that the code to generate the background set is trivial (see below).
geneNames <- names(geneID2GO)
This is the set of ID's in which we will look for enriched GO terms. This set can be any subset of the background, and the set that is chosen affects the interpretation of the analysis. In our proteomics analyses we might look at the top 200 most expressed proteins (by iBAQ). Alternatively the top hits from the differential expression analysis could be used.
myInterestingGenes=... my code to generate these
It is essential that the items in the myInterestingGenes
vector correspond to items within the background ie (geneNames
).
geneList <- factor(as.integer(geneNames %in% myInterestingGenes))
names(geneList) <- geneNames
GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList,annot = annFUN.gene2GO, gene2GO = geneID2GO)
You might also want to try the other two GO
ontologies MF
, Molecular Function and CC
Cellular Component
This runs a Fisher's exact test
resultFis <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
This creates a plot, but the table allRes
is perhaps more useful.
allRes <- GenTable(GOdata, classic = resultFis,orderBy = "weight", ranksOf = "classic", topNodes = 20)
showSigOfNodes(GOdata, score(resultFis), firstSigNodes = 5, useInfo = 'all')