Skip to content

Instantly share code, notes, and snippets.

View mikelove's full-sized avatar

Michael Love mikelove

View GitHub Profile
@mikelove
mikelove / genome_summary.tsv
Created May 18, 2017 19:51
Genome summary information scraped from Ensembl
type name species bp coding noncoding pseudo tx
mammal Alpaca Vicugna_pacos 2967746133 11765 2532 898 15236
fish Amazon molly Poecilia_formosa 748923461 23615 679 60 31637
invert Roundword Caenorhabditis_elegans 100286401 20362 24719 1658 58941
invert Sea squirt Ciona_intestinalis 115227500 16671 455 27 17784
invert Solitary sea squirt Ciona_savignyi 177003750 11616 340 216 20711
mammal Cat Felis_catus 2455541136 19493 1855 542 22656
fish Cave fish Astyanax_mexicanus 1191242572 23042 2208 21 25927
bird Chicken Gallus_gallus 1230258557 18346 6492 43 38118
mammal Chimpanzee Pan_troglodytes 3309577922 18759 8681 572 29160
@mikelove
mikelove / bootstrap-t.R
Created June 15, 2017 12:58
bootstrap t intervals
n <- 20
r <- 1000
B <- 1000
true.mean <- 3
simple.contains <- numeric(r)
simple.ci <- matrix(NA,nrow=r,ncol=2)
for (i in 1:r) {
x <- rexp(n,rate=1/true.mean)
s <- numeric(B)
for (j in 1:B) {
@mikelove
mikelove / grading.R
Last active December 14, 2017 14:30
Script to automate pushing scores to student repos
x <- read.csv("grading - Sheet1.csv", stringsAsFactors=FALSE)
repo <- paste0("homework-",x$github)
for (i in seq_along(repo)) {
#system(paste0("git clone [email protected]:biodatascience/",repo[i],".git"))
setwd(repo[i])
system("git pull")
setwd("..")
}
hw <- "HW1"
cols <- x[,grep(hw,names(x))]
@mikelove
mikelove / alpine.R
Last active September 9, 2017 18:52
alpine
# ftp://ftp.ensembl.org/pub/release-81/gtf/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz
library(ensembldb)
#ensDbFromGtf("Homo_sapiens.GRCh38.81.gtf.gz")
txdb <- EnsDb("Homo_sapiens.GRCh38.81.sqlite")
txdf <- transcripts(txdb, return.type="DataFrame")
txps <- transcripts(txdb)
tab <- table(txdf$gene_id)
one.iso.genes <- names(tab)[tab == 1]
ebt0 <- exonsBy(txdb, by="tx")
@mikelove
mikelove / groups.R
Created September 22, 2017 14:59
Code to group students to balance across skill sets
x <- read.csv("bios784proj-report.csv")
y <- x[,3:6]
colnames(y) <- c("r","b","s","h")
y$h <- 4 - y$h # convert HW difficulty => comfort
y[10,1] <- 2 # student didn't answer
head(y)
cor(y)
set.seed(1)
grps <- sample(rep(1:9,3))
@mikelove
mikelove / rawcounts.R
Created October 24, 2017 01:18
PCA on raw counts
# what dominates PC1: 100 features with 10x fold change or 1 feature with 2x fold change?
mat <- rbind(matrix(rpois(8*100,rep(c(10,100),each=4)),ncol=8,byrow=TRUE),
rpois(8,rep(c(1000,1000,2000,2000),2)))
plot(prcomp(t(mat))$x[,1])
# JS estimator for Normal scale
Atrue <- .2
n <- 1000
res <- t(sapply(1:100, function(i) {
set.seed(i)
theta <- rnorm(n, 0, sqrt(Atrue))
D <- rexp(n,rate=.1)
X <- rnorm(n, theta, sqrt(D))
(rough <- var(X) - mean(D))
@mikelove
mikelove / rmosek
Created December 29, 2017 00:44
How to install Rmosek
1) Download mosek from here:
https://www.mosek.com/downloads/
(I downloaded this to my ~/bin)
cd ~/bin
tar -xvf mosektoolslinux64x86.tar.bz2
2) Add this to your ~/.bashrc
export PATH=$PATH:/home/username/bin/mosek/8/tools/platform/linux64x86/bin
@mikelove
mikelove / numeric.R
Last active January 4, 2018 21:11
trying out RcppNumeric (R script)
f <- function(beta, coefs) coefs[1]*beta[1]^4 + coefs[2]*beta[1]^2 +
coefs[3]*beta[1] + coefs[4]*beta[2]^2 + coefs[5]
n <- 100000
coefs <- matrix(c(0.1, -1, 1, 1, 10,
0.1, -1, .5, 1, 10), nrow=5, ncol=n)
s <- seq(-4, 4,length.out = 100)
plot(s, sapply(s, function(z) f(c(z,0),coefs[,1])), type="l")
par <- optim(c(0,0), f, coefs=coefs)$par
@mikelove
mikelove / numeric.cpp
Created January 4, 2018 20:55
trying out RcppNumeric (C++ script)
/ [[Rcpp::depends(RcppEigen)]]
// [[Rcpp::depends(RcppNumerical)]]
#include <RcppNumerical.h>
using namespace Numer;
typedef Eigen::Map<Eigen::MatrixXd> MapMat;
typedef Eigen::Map<Eigen::VectorXd> MapVec;
class FooFun: public MFuncGrad