Input data needed;
- A file
orf2go.mapfor your study organism. It should contain entries like this;cds.comp1000018_c0_seq1|m.205055 GO:0005618, GO:0016021, GO:0045330, GO:0004857, GO:0030599, GO:0042545 cds.comp1000154_c0_seq1|m.205060 GO:0030136, GO:0031901, GO:0005768
If you don't have an orf2go.map file but you have a Trinity sqlite file you can generate one using trinotate_proteomics as follows;
orf_to_go.rb Trinotate.sqlite > orf2go.map- A file containing a list of IDs for a specific tissue, or (optionally) with quantitative expression info.
library(topGO)
geneID2GO <- readMappings(file = "orf2go.map")Note that in this example the file orf2go.map serves 2 purposes. It contains all transcriptID -> GO mappings. But it also contains a complete list of all the transcriptID's. This complete list of transcript ID's serves as the background set for the enrichment analysis. This means that the code to generate the background set is trivial (see below).
geneNames <- names(geneID2GO)This is the set of ID's in which we will look for enriched GO terms. This set can be any subset of the background, and the set that is chosen affects the interpretation of the analysis. In our proteomics analyses we might look at the top 200 most expressed proteins (by iBAQ). Alternatively the top hits from the differential expression analysis could be used.
myInterestingGenes=... my code to generate theseIt is essential that the items in the myInterestingGenes vector correspond to items within the background ie (geneNames).
geneList <- factor(as.integer(geneNames %in% myInterestingGenes))
names(geneList) <- geneNames GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList,annot = annFUN.gene2GO, gene2GO = geneID2GO)You might also want to try the other two GO ontologies MF, Molecular Function and CC Cellular Component
This runs a Fisher's exact test
resultFis <- runTest(GOdata, algorithm = "classic", statistic = "fisher")This creates a plot, but the table allRes is perhaps more useful.
allRes <- GenTable(GOdata, classic = resultFis,orderBy = "weight", ranksOf = "classic", topNodes = 20)
showSigOfNodes(GOdata, score(resultFis), firstSigNodes = 5, useInfo = 'all')