Skip to content

Instantly share code, notes, and snippets.

View lwaldron's full-sized avatar

Levi Waldron lwaldron

View GitHub Profile
library(curatedTCGAData)
curatedTCGAData("OV")
mae <- curatedTCGAData("OV", assays=c("mRNAArray", "RNASeq2GeneNorm", "RNASeqGene"), dry.run = FALSE)
library(TCGAutils)
keep <- TCGAsampleSelect(colnames(mae), 11)
mae <- mae[, keep, ]
mae <- intersectColumns(mae)
mae <- intersectRows(mae)
library(org.Hs.eg.db)
@lwaldron
lwaldron / symbolsToRanges.R
Last active June 22, 2018 13:31
Add ranges to RNA-seq and microarray SummarizedExperiments where rownames are gene symbols in a MultiAssayExperiment
.cMAE <- function(mae, x, name="newelement"){
el <- ExperimentList(tmp=x)
names(el)[1] <- name
c(mae, el)
}
.hasSymbols <- function(x){
mean(c(FALSE, grepl("^[A-Z0-9]{1,6}|^C[0-9]orf[0-9]{1,4}", rownames(x))), na.rm=TRUE) > 0.9
}
.isSummarizedExperiment <- function(x){
@lwaldron
lwaldron / simplifyTCGAData.R
Last active June 21, 2018 23:45
Coerce (and optionally remove) all RaggedExperiments in a curatedTCGAData MAE.
simplifyTCGAData <- function(obj, removeRaggedExperiments=TRUE){
##This function will convert mutations to a genes x samples RangedSummarizedExperiment of 1 for non-silent mutations, 1 for silent or no mutation
##It will convert segmented copy number to copy number per gene, using a weighted average if there are non-disjunct ranges
suppressPackageStartupMessages({
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(org.Hs.eg.db)
library(GenomeInfoDb)
})
gn <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
gn <- keepStandardChromosomes(granges(gn), pruning.mode="coarse")
@lwaldron
lwaldron / metaphlanToPhyloseq.R
Last active October 15, 2024 21:04
Import a table of MetaPhlAn taxonomic abundances into phyloseq
metaphlanToPhyloseq <- function(
metaphlandir,
metadat=NULL,
simplify=TRUE){
## tax is a matrix or data.frame with the table of taxonomic abundances, rows are taxa, columns are samples
## metadat is an optional data.frame of specimen metadata, rows are samples, columns are variables
## if simplify=TRUE, use only the most detailed level of taxa names in the final object
## metaphlanToPhyloseq("~/Downloads/metaphlan_bugs_list")
.getMetaphlanTree <- function(removeGCF=TRUE, simplify=TRUE){
if (!requireNamespace("ape")) {
@lwaldron
lwaldron / CI_confusion.R
Last active June 2, 2018 12:51
What is the fraction of sample means expected to fall within the 95% confidence intervals? This is an incorrect (and not useful) interpretation of the 95% CI, but out of curiosity, the answer is less than 95%.
fracwithin <- function(n){
sam1 <- rnorm(n)
ci <- t.test(sam1)$conf.int
means <- replicate(100, mean(rnorm(n)))
mean(means > ci[1] & means < ci[2])
}
set.seed(2)
n <- seq(5, 10000, by=2)
fraction <- sapply(n, function(i) fracwithin(i))
@lwaldron
lwaldron / OVC_SingleCell.Rmd
Created May 29, 2018 15:08
Import OVC single-cell data
---
title: "OVC single-cell analysis"
author: "Levi Waldron"
date: "5/29/2018"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
@lwaldron
lwaldron / HMP_controlled.R
Last active April 10, 2018 14:30
Accessing HMP controlled metadata
# Before starting, install the NCBI SRA Tookit:
# Debian/Ubuntu:
# apt install sra-toolkit
# macOS:
# brew install sratoolkit
# Windows:
# see https://tinyurl.com/y845ppaa
# Get your dbGaP repository from https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=list_wishlists
# Click "get dbGaP repository key"
library(HMP16SData)
@lwaldron
lwaldron / smallSE.R
Last active March 22, 2018 12:20
Create a small SummarizedExperiment from one dataset in curatedMetagenomicData (including ape phylogenetic tree)
library(curatedMetagenomicData)
library(SummarizedExperiment)
library(tidyr)
library(ape)
simplifynodes <- TRUE
makeTaxTable <- function(fullnames){
taxonomic.ranks <- c("Kingdom", "Phylum", "Class", "Order",
"Family", "Genus", "Species", "Strain")
@lwaldron
lwaldron / bigSE.R
Last active March 22, 2018 12:23
Create a big SummarizedExperiment from curatedMetagenomicData (including phylogenetic tree)
## A big SE
library(curatedMetagenomicData)
library(SummarizedExperiment)
library(tidyr)
library(ape)
simplifynodes <- TRUE
makeTaxTable <- function(fullnames){
@lwaldron
lwaldron / CompareBedFiles.R
Created February 27, 2018 22:06
Comparing two exome sequencing capture kits
url1 <- "https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_nextera/nexterarapidcapture/nexterarapidcapture_exome_targetedregions.bed"
url2 <- "https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_nextera/nexterarapidcapture/nexterarapidcapture_expandedexome_targetedregions.bed"
library(rtracklayer)
rapidgr <- import(url1, genome="hg19")
expandedgr <- import(url2, genome="hg19")
library(GenomicRanges)
table(countOverlaps(rapidgr, expandedgr))