Kieran R Campbell kieranrcampbell

Researcher, statistical machine learning for genomics

kieranrcampbell / reproduce-tscan.R

Last active May 20, 2016 14:06

TSCAN applied to small subset of monocle data

kieranrcampbell / sample_beta

Created October 25, 2016 11:27

	// [[Rcpp::export]]
	NumericMatrix sample_tau_pg(NumericMatrix beta,
	double a_beta, double b_beta) {
	int P = beta.nrow();
	int G = beta.ncol();

	NumericMatrix tau_pg(P, G);

	for(int p = 0; p < P; p++) {
	for(int g = 0; g < G; g++) {

kieranrcampbell / calc_tpr_fpr_fdr.R

Created March 16, 2017 15:39

Calculate true positive rate, false positive rate & false discovery rate from contingency table in R

	## Suppose we have a contigency table tbl formed by the table(...) command in R, with
	## a logical vector of discoveries as the first argument and a logical vector of
	## the ground truth as the second, e.g. tbl <- table(discoveries, ground_truth), then
	## this function calculates the true positive rate, false positive rate and false discovery rate
	## as per the wikipedia definition at https://en.wikipedia.org/wiki/Sensitivity_and_specificity

	calculate_statistics <- function(tbl) {
	P <- sum(tbl[,2])
	N <- sum(tbl[,1])
	TP <- tbl[2,2]

kieranrcampbell / weibull.py

Created April 5, 2017 13:02

	from edward.models import RandomVariable
	import tensorflow as tf
	from tensorflow.contrib.distributions import Distribution


	class Weibull(RandomVariable, Distribution):
	"""Weibull distribution

	The Weibull distribution is defined over the non-negative real numbers.

kieranrcampbell / go_analysis.R

Created May 3, 2017 14:07

Gene ontology enrichment analysis with Goseq

	# A function that takes a set of genes and a background list and goes through the
	# steps in goseq to output the enriched list

	library(goseq)

	#' Run all steps for a goseq analysis
	#'
	#' @param genes The genes "differentially expressed" or in the condition
	#' @param all_genes The background gene set
	#' @param genome The genome of genes and all_genes, e.g. mm10 or hg19

kieranrcampbell / snakefile_for_rmarkdown

Created May 12, 2017 08:55

Snakemake to build Rmarkdown files

	rule exploratory_analysis:
	input:
	"data/tec_sceset_qc.rds", "analysis/02_exploratory_analysis.html"
	output:
	"data/tec_sceset_clusters.rds",
	"analysis/02_exploratory_analysis.Rmd",
	"data/deseq2_results.csv"
	shell:
	"Rscript -e \"rmarkdown::render('analysis/02_exploratory_analysis.Rmd')\""

kieranrcampbell / goplot.R

Created May 15, 2017 09:06

Plot the p-values of a GO analysis

	library(stringr)
	library(ggplot)
	library(dplyr)

	## Assuming "go" has columns "term" and "q_value"

	go_top <- head(go, n = 10)
	go_top$term <- str_to_title(go_top$term)
	terms_sorted <- arrange(go_top, desc(q_value)) %>% .$term
	go_top$term <- factor(go_top$term, levels = terms_sorted)

kieranrcampbell / min_rep_fg.R

Created June 3, 2017 16:54

	## You can show that the estimate of fg is wrong only
	## when m_beta = 1 and m_c = 1 and xx = 1 and that in
	## such a case it appears to overcount by 2, implying term 11 in the derivation is wrong

	m_alpha <- 0
	s_alpha <- 1
	m_beta <- 1
	s_beta <- 1
	m_t <- 0
	s_t <- 1

kieranrcampbell / overdispersion-analysis.R

Created June 15, 2017 09:55

Brief example of scran's method for "Accounting for technical noise in single-cell RNA-seq experiments"

	library(stringr)
	library(ggplot2)
	library(scran)
	library(scater)
	library(dplyr)

	# Step 1: do the overdispersion analysis

	sc2 <- sce[rowMeans(round(exprs(sce)) > 0) > 0, ] # Keep only genes that are expressed
	is_ercc <- grepl("NA_ERCC", featureNames(sc2)) # Your ERCCs might be named differently to mine!

kieranrcampbell / cnv_cols.R

Last active April 9, 2018 20:08

	# Nice colour palette for visualising copy number profiles
	# inspired by DLP paper
	cnv_cols <- c("0" = "#deebf7",
	"1" = "#9ecae1",
	"2" = "grey80",
	"3" = "#fdae6b",
	"4" = "#e6550d")

	# Factors for chromosomes
	chr_levels <- c(as.character(1:23), "X", "Y")