Last active
August 29, 2015 14:20
-
-
Save mikelove/83e47aa27dd4a71ea73f to your computer and use it in GitHub Desktop.
This is the Rpres file, the next gist, biocII.html can be downloaded and viewed in a browser.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How It's Done: Bioc II | |
=== | |
author: Michael Love | |
date: 30 April 2015 | |
autosize: false | |
width: 1600 | |
Survey results (n=28) | |
=== | |
<center> | |
```{r echo=FALSE, fig.width=14} | |
library(rafalib) | |
mypar() | |
res <- c(gene.models=13, | |
read.seq=12, | |
read.counts=8, | |
Anno.DB=10, | |
make.pkg=10, | |
BEDtools=9, | |
nice.HTML=8, | |
parallel=8) | |
barplot(res, col=rainbow(7)) | |
``` | |
</center> | |
<https://gist.github.com/mikelove/83e47aa27dd4a71ea73f> | |
Quick note on versioning | |
=== | |
```{r cache=TRUE} | |
biocVersion() # your machine | |
packageVersion("limma") | |
biocValid("limma") | |
``` | |
http://bioconductor.org/bioc-version | |
Second quick note: help and vignettes | |
=== | |
What does this function / package do? Always go for help | |
```{r eval=FALSE} | |
?function | |
help(package="foo", help_type="html") | |
browseVignettes("foo") | |
``` | |
Also: http://support.bioconductor.org and post your `sessionInfo()`. | |
Gene models | |
=== | |
This functionality comes from [GenomicFeatures](http://bioconductor.org/packages/GenomicFeatures) package. | |
```{r} | |
library("TxDb.Hsapiens.UCSC.hg19.knownGene") | |
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene | |
``` | |
Gene models | |
=== | |
```{r cache=TRUE} | |
g <- genes(txdb) | |
g["128229"] | |
``` | |
Also `exons()` and `transcripts()` | |
Gene models: exons by gene | |
=== | |
```{r cache=TRUE} | |
ebg <- exonsBy(txdb, by="gene") | |
ebg[["128229"]] | |
``` | |
Gene models: transcripts by gene | |
=== | |
```{r cache=TRUE} | |
tbg <- transcriptsBy(txdb, by="gene") | |
tbg[["128229"]] | |
``` | |
Gene models: exons by transcript | |
=== | |
```{r cache=TRUE} | |
ebt <- exonsBy(txdb, by="tx") | |
ids <- tbg[["128229"]]$tx_id | |
ebt.sub <- ebt[ids] | |
lapply(ebt.sub, function(gr) gr$exon_id) | |
``` | |
Read in sequencing data | |
=== | |
Basic functionality in the [Rsamtools](http://bioconductor.org/packages/Rsamtools) library. | |
```{r echo=FALSE} | |
filename <- system.file("extdata/SRR1039508_subset.bam",package="airway") | |
``` | |
```{r} | |
library(Rsamtools) | |
bf <- BamFile(filename) | |
seqinfo(bf) | |
``` | |
Read in sequencing data | |
=== | |
```{r echo=FALSE} | |
index <- indexBam(bf) | |
bf <- BamFile(filename) | |
``` | |
```{r cache=TRUE} | |
countBam(bf, param=ScanBamParam(which=GRanges("1",IRanges(11e6,11.4e6)))) | |
``` | |
Read in sequencing data | |
=== | |
```{r cache=TRUE} | |
reads <- scanBam(bf, param = ScanBamParam(which = GRanges("1",IRanges(11e6,11.4e6)), what = c("strand","pos"))) | |
class(reads) | |
head(reads[[1]]$strand) # '1' for the first range in 'which' | |
head(reads[[1]]$pos) | |
``` | |
Options for **what**: qname, flag, strand, pos, cigar, etc. | |
Read in sequencing data: nice objects | |
=== | |
Nicer access to sequencing data in the [GenomicAlignments](http://bioconductor.org/packages/GenomicAlignments) package. | |
```{r cache=TRUE} | |
library(GenomicAlignments) | |
ga <- readGAlignmentPairs(bf) | |
ga[1] | |
``` | |
Read in sequencing data: nice objects | |
=== | |
```{r cache=TRUE} | |
first(ga)[1] | |
``` | |
Also `last()`, `left()` and `right()`. | |
Read count table | |
=== | |
Often people want to count NGS reads which overlap features. Two good functions for this: | |
* `summarizeOverlaps` in the [GenomicAlignments](http://bioconductor.org/packages/GenomicAlignments) package | |
* `featureCounts` in the [Rsubread](http://bioconductor.org/packages/Rsubread) package | |
Read count table: summarizeOverlaps | |
=== | |
```{r cache=TRUE} | |
gr <- GRanges("1",IRanges(110:114 * 1e5, width=1e5)) | |
so <- summarizeOverlaps(gr, bf) | |
assay(so) | |
``` | |
Lots of options in `?summarizeOverlaps`. To count in the exons of genes or transcripts you would use `exonsBy`. | |
Instead of a BamFile, we could have had a BamFileList. | |
Read count table: featureCounts | |
=== | |
```{r echo=FALSE} | |
gtf.file <- system.file("extdata/Homo_sapiens.GRCh37.75_subset.gtf",package="airway") | |
``` | |
```{r results="hide", cache=TRUE} | |
library(Rsubread) | |
fc <- featureCounts(filename, annot.ext=gtf.file, isGTFAnnotationFile=TRUE) | |
``` | |
```{r cache=TRUE} | |
fc$counts[,1] | |
``` | |
Likewise, here we could have used multiple files. | |
Annotation interface | |
=== | |
The annotation packages use functionality from [AnnotationDbi](http://bioconductor.org/packages/AnnotationDbi) | |
```{r} | |
library(org.Hs.eg.db) | |
keytypes(org.Hs.eg.db) | |
# columns(org.Hs.eg.db) | |
``` | |
Annotation interface | |
=== | |
```{r cache=TRUE} | |
head(keys(org.Hs.eg.db)) | |
``` | |
Annotation interface | |
=== | |
```{r cache=TRUE} | |
select(org.Hs.eg.db, key=c("TP53","PTEN"), | |
keytype="SYMBOL", columns=c("ENSEMBL","ENTREZID")) | |
``` | |
Another option is the [biomaRt](http://bioconductor.org/packages/biomaRt) package, which pings Biomart servers, but I like that the *org* packages are downloadable, and therefore versionable and reproducible. | |
Annotation interface | |
=== | |
New in Bioc 3.1, `mapIds()` returns as a named character vector | |
```{r cache=TRUE} | |
mapIds(org.Hs.eg.db, key=c("TP53","PTEN"), keytype="SYMBOL", column="ENSEMBL") | |
``` | |
Annotation interface: txdb's also are databases | |
=== | |
```{r cache=TRUE} | |
keytypes(txdb) | |
select(txdb, keys=c("1","10"), keytype="GENEID", columns="TXID") | |
``` | |
Make a package | |
=== | |
Here I show the bare minimum. For more, see Hadley Wickham's [R packages](http://r-pkgs.had.co.nz/) | |
```{r eval=FALSE} | |
source("scripts.R") | |
package.skeleton(name = "foo") # packages up all functions in envir. | |
library(devtools) | |
document() | |
``` | |
Also good is the *testthat* packaage, for unit testing. | |
Make a package | |
=== | |
This is documentation using the *roxygen2* package. | |
```{r eval=FALSE} | |
#' title | |
#' | |
#' descriptive paragraph | |
#' | |
#' @param x a numeric value | |
#' | |
#' @return a numeric value, x^2 | |
foo <- function(x) { | |
x^2 | |
} | |
``` | |
BEDtools-like operations | |
=== | |
Read about these in the [GenomicRanges](http://bioconductor.org/packages/GenomicRanges) manual. | |
```{r} | |
gr1 <- GRanges("1", IRanges(c(1,11,21),c(5,15,25))) | |
gr2 <- GRanges("1", IRanges(c(1,9),c(13,12))) | |
gr1 %over% gr2 | |
``` | |
BEDtools-like operations | |
=== | |
```{r} | |
fo <- findOverlaps(gr1, gr2) | |
fo | |
``` | |
BEDtools-like operations | |
=== | |
```{r} | |
intersect(gr1, gr2) | |
``` | |
BEDtools-like operations | |
=== | |
```{r} | |
union(gr1, gr2) | |
``` | |
BEDtools-like operations | |
=== | |
```{r} | |
gr1 <- GRanges("1", IRanges(5001000,width=100)) | |
gr2 <- GRanges("1", IRanges((1:10) * 1e6, width=300)) | |
distanceToNearest(gr1, gr2) | |
``` | |
Nice HTML reports | |
=== | |
From the [ReportingTools](http://bioconductor.org/packages/ReportingTools) vignette. | |
```{r} | |
my.df <- data.frame(EGID = c("103", "104", "105", "106", "107"), | |
RPKM = c(4, 5, 3, 100, 75), | |
DE = c("Yes", "Yes", "No", "No", "No")) | |
my.df | |
``` | |
Nice HTML reports | |
=== | |
```{r eval=FALSE} | |
library(ReportingTools) | |
htmlRep <- HTMLReport(shortName = "my_html_file", | |
reportDirectory = "report", | |
basePath="~/Desktop") | |
publish(my.df, htmlRep) | |
finish(htmlRep) | |
``` | |
Note, this is already set up to work with limma, edgeR, DESeq2, etc. Can show boxplots and link out to external db's. Check `browseVignettes("ReportingTools")`. | |
BiocParallel | |
=== | |
[BiocParallel](http://bioconductor.org/packages/BiocParallel) implements `bplapply` which is very similar to `mclapply`. | |
"*BiocParallel aims to provide a unified interface to existing parallel infrastructure where code can be easily executed in different environments*" | |
```{r eval=FALSE} | |
library(BiocParallel) | |
register(MulticoreParam(5))) | |
# register(SerialParam()) | |
result.list <- bplapply(1:10, function(i) { | |
heavyWork(filename[i]) | |
}) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment