Skip to content

Instantly share code, notes, and snippets.

View brentp's full-sized avatar

Brent Pedersen brentp

View GitHub Profile
@brentp
brentp / target-roc.py
Last active December 15, 2015 10:39
draw an ROC curve (true+ is a read in a target region, false+ is a read outside any target region) and a coverage plot for a resequencing project.
"""
for targetted resequencing projects, we can calculate the proportion of reads
mapped in/outside the target region. By taking that metric while varying
the mapping quality cutoff, we can plot an ROC curve of the on/off-target
reads. This works best when the region is much, much smaller than the
genome.
"""
from toolshed import reader, nopen
import tempfile
from itertools import groupby
@brentp
brentp / burden.R
Created March 22, 2013 12:54
see if increasing the number of probes in a burden test changes the p-value distribution
library(AssotesteR)
library(ggplot2)
burden_test = CMC
cases = 300
ctrls = 300
MAF = 0.05
reps = 1000
n_probes = c(10, 20, 30, 50, 100, 250)
CpG Intercept PanTcell CD8T CD4T NK Bcell Mono Gran
cg00226923 0.908254648552594 0.0218475536904394 -0.0290746444815646 0.019520009724875 0.0349316733273465 -0.891558627613285 0.043308690164259 0.0442108030283962
cg16698623 0.910165601928021 0.0291908471088487 -0.0205090331614662 0.0068070685852062 0.0216508562925718 -0.804918829478583 0.0383423000849344 0.0301016053028316
cg14102807 0.861710374865453 0.0133515467273259 -0.024789511157264 -0.00679169717021715 0.0249064091110884 -0.802710142557113 0.054985251674549 0.0462940463367248
cg14127336 0.883485808414202 0.0483939005799524 -0.0405613173629306 0.0140909168912605 0.0514708954741297 -0.855206370039137 0.0631463970943434 0.0572638103519052
cg24641737 0.0741634195006262 -0.030853038777134 0.000300863984962999 -0.00486344785097266 -0.0115308877837343 0.808367400203855 -0.0336872495057226 -0.0363950545826554
cg20981615 0.528012233377504 -0.40285281488659 0.139389398121522 -0.0200902187680063 -0.450320678179238 0.340397795211808 0.290289848424227 0.18834591750
@brentp
brentp / README.md
Last active October 12, 2023 15:21

Linkage Disequilibrium Calculation

This is complete taken from Ryan D on biostar: http://www.biostars.org/p/2909/#16419

It takes a region in a format like "chr2:1234-3456". If an rs number is specified after the region, it will output the R^2 for every SNP in that region with the requested SNP; otherwise, it is all-vs-all.

Very Bad Things

Originally, I named this document "very bad things" because it does some stuff that proper folks should not do, like mixing bash and python and leaving filehandles unclosed. It's about a function I wrote to get sh*t done.

I've been doing bioinformatics for about 10 years now. I used to joke with a friend of mine that most of our work was converting between file formats. We don't joke about that anymore.

class fdict(dict):
def __getitem__(self, fname):
try:
return self.__dict__[fname]
except KeyError:
self.__dict__[fname] = open(fname, 'w')
return self.__dict__[fname]
@brentp
brentp / toxls.py
Created November 21, 2012 15:08
convert tab-delimited files to xls and vice-versa
"""
convert tab-delimited to xls (shrug).
takes the converts input.txt to input.xls
or vice-versa
"""
import sys
import os.path as op
from toolshed import reader
from itertools import imap as map
@brentp
brentp / qreader.py
Created November 3, 2012 23:35
send a job the Q and iterate over the results.
import tempfile
import sys
import os
import os.path as op
import time
from toolshed import reader
def qreader(cmd, outfile=None, name=None, header=True, sep="\t", **kwargs):
"""
@brentp
brentp / exp.md
Last active August 20, 2018 13:09
sketch power calculation
python extract-peaks.py peaks/* > peaks.split.hg19.bed
lift peaks.split.hg19.bed hg19 hg18
python rejoin.py peaks.split.hg19.bed.hg18 > peaks.hg18.merge.bed