Levi Waldron lwaldron

I teach biostatistics, develop for Bioconductor, and have an active research program in cancer genomics and in metagenomic profiling of the human microbiome.

136 followers · 19 following

The @waldronlab at the City University of New York
New York, NY
https://waldronlab.io
@LeviWaldron1

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

lwaldron / OVmae.R

Created July 5, 2018 15:04

	library(curatedTCGAData)
	curatedTCGAData("OV")
	mae <- curatedTCGAData("OV", assays=c("mRNAArray", "RNASeq2GeneNorm", "RNASeqGene"), dry.run = FALSE)
	library(TCGAutils)
	keep <- TCGAsampleSelect(colnames(mae), 11)
	mae <- mae[, keep, ]
	mae <- intersectColumns(mae)
	mae <- intersectRows(mae)

	library(org.Hs.eg.db)

lwaldron / symbolsToRanges.R

Last active June 22, 2018 13:31

Add ranges to RNA-seq and microarray SummarizedExperiments where rownames are gene symbols in a MultiAssayExperiment

	.cMAE <- function(mae, x, name="newelement"){
	el <- ExperimentList(tmp=x)
	names(el)[1] <- name
	c(mae, el)
	}

	.hasSymbols <- function(x){
	mean(c(FALSE, grepl("^[A-Z0-9]{1,6}\|^C[0-9]orf[0-9]{1,4}", rownames(x))), na.rm=TRUE) > 0.9
	}
	.isSummarizedExperiment <- function(x){

lwaldron / simplifyTCGAData.R

Last active June 21, 2018 23:45

Coerce (and optionally remove) all RaggedExperiments in a curatedTCGAData MAE.

	simplifyTCGAData <- function(obj, removeRaggedExperiments=TRUE){
	##This function will convert mutations to a genes x samples RangedSummarizedExperiment of 1 for non-silent mutations, 1 for silent or no mutation
	##It will convert segmented copy number to copy number per gene, using a weighted average if there are non-disjunct ranges
	suppressPackageStartupMessages({
	library(TxDb.Hsapiens.UCSC.hg19.knownGene)
	library(org.Hs.eg.db)
	library(GenomeInfoDb)
	})
	gn <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
	gn <- keepStandardChromosomes(granges(gn), pruning.mode="coarse")

lwaldron / metaphlanToPhyloseq.R

Last active October 15, 2024 21:04

Import a table of MetaPhlAn taxonomic abundances into phyloseq

	metaphlanToPhyloseq <- function(
	metaphlandir,
	metadat=NULL,
	simplify=TRUE){
	## tax is a matrix or data.frame with the table of taxonomic abundances, rows are taxa, columns are samples
	## metadat is an optional data.frame of specimen metadata, rows are samples, columns are variables
	## if simplify=TRUE, use only the most detailed level of taxa names in the final object
	## metaphlanToPhyloseq("~/Downloads/metaphlan_bugs_list")
	.getMetaphlanTree <- function(removeGCF=TRUE, simplify=TRUE){
	if (!requireNamespace("ape")) {

lwaldron / CI_confusion.R

Last active June 2, 2018 12:51

What is the fraction of sample means expected to fall within the 95% confidence intervals? This is an incorrect (and not useful) interpretation of the 95% CI, but out of curiosity, the answer is less than 95%.

	fracwithin <- function(n){
	sam1 <- rnorm(n)
	ci <- t.test(sam1)$conf.int
	means <- replicate(100, mean(rnorm(n)))
	mean(means > ci[1] & means < ci[2])
	}

	set.seed(2)
	n <- seq(5, 10000, by=2)
	fraction <- sapply(n, function(i) fracwithin(i))

lwaldron / OVC_SingleCell.Rmd

Created May 29, 2018 15:08

Import OVC single-cell data

	---
	title: "OVC single-cell analysis"
	author: "Levi Waldron"
	date: "5/29/2018"
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)
	```

lwaldron / HMP_controlled.R

Last active April 10, 2018 14:30

Accessing HMP controlled metadata

	# Before starting, install the NCBI SRA Tookit:
	# Debian/Ubuntu:
	# apt install sra-toolkit
	# macOS:
	# brew install sratoolkit
	# Windows:
	# see https://tinyurl.com/y845ppaa
	# Get your dbGaP repository from https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=list_wishlists
	# Click "get dbGaP repository key"
	library(HMP16SData)

lwaldron / smallSE.R

Last active March 22, 2018 12:20

Create a small SummarizedExperiment from one dataset in curatedMetagenomicData (including ape phylogenetic tree)

	library(curatedMetagenomicData)
	library(SummarizedExperiment)
	library(tidyr)
	library(ape)

	simplifynodes <- TRUE

	makeTaxTable <- function(fullnames){
	taxonomic.ranks <- c("Kingdom", "Phylum", "Class", "Order",
	"Family", "Genus", "Species", "Strain")

lwaldron / bigSE.R

Last active March 22, 2018 12:23

Create a big SummarizedExperiment from curatedMetagenomicData (including phylogenetic tree)

	## A big SE

	library(curatedMetagenomicData)
	library(SummarizedExperiment)
	library(tidyr)
	library(ape)

	simplifynodes <- TRUE

	makeTaxTable <- function(fullnames){

lwaldron / CompareBedFiles.R

Created February 27, 2018 22:06

Comparing two exome sequencing capture kits

	url1 <- "https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_nextera/nexterarapidcapture/nexterarapidcapture_exome_targetedregions.bed"
	url2 <- "https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_nextera/nexterarapidcapture/nexterarapidcapture_expandedexome_targetedregions.bed"

	library(rtracklayer)
	rapidgr <- import(url1, genome="hg19")
	expandedgr <- import(url2, genome="hg19")

	library(GenomicRanges)

	table(countOverlaps(rapidgr, expandedgr))

Newer Older