Skip to content

Instantly share code, notes, and snippets.

View brentp's full-sized avatar

Brent Pedersen brentp

View GitHub Profile
@brentp
brentp / README.md
Last active August 29, 2015 14:10
automatic running bedtools shuffle

usage:

shuf.sh "bedtools cmd" "metric" $genome

where metric must accept the output from bedtools cmd and return a single numeric value.

@brentp
brentp / betareg.py
Last active September 17, 2024 14:48
beta regression in statsmodels
# -*- coding: utf-8 -*-
u"""
Beta regression for modeling rates and proportions.
References
----------
Grün, Bettina, Ioannis Kosmidis, and Achim Zeileis. Extended beta regression
in R: Shaken, stirred, mixed, and partitioned. No. 2011-22. Working Papers in
Economics and Statistics, 2011.
@brentp
brentp / compare-wrappers.py
Last active May 2, 2024 08:11
compare call overhead of cffi and cython
"""
Compare speed of a cython wrapper vs a cffi wrapper to the same underlying
C-code with a fast function and a longer-running function.
This should run anywhere that has cffi and cython installed.
Ouput on my machine with python2.7:
brentp@liefless:/tmp$ python compare-wrappers.py
library(SKAT)
set.seed(1)
hap.skat.null = function(formula, covs, ...){
covs = read.delim(covs)
assign("y", covs[[as.character(formula[[2]])]], envir = .GlobalEnv)
m = SKAT_Null_Model(formula, data=covs, out_type="D")
return(m)
}
@brentp
brentp / README.md
Last active January 2, 2016 01:39
add a read-group to a bam file. usage:

add a readgroup to a bam doesn't need the entire "@RG\t..." just a single id.

addrg some.bam my-id > some.rg.bam

or

generate-bam | addrg - > some.rg.bam
/*
* example for me to learn khash.h library from lh3
*/
#include "khash.h"
#include <stdio.h>
// int keys and char * values
KHASH_MAP_INIT_INT(ihash, char *)
@brentp
brentp / kin.R
Created November 4, 2013 20:04
calculate probability of single rare snps using a binomial test and adjusting observed counts such that 2 members from the same family will contribute a number less that 2, depending on their relatedness. This allows us to use family information, but not to over-count rare-alleles due to their presence in families.
library(SNPRelate)
library(FDb.UCSC.snp137common.hg19)
library(hash)
args = commandArgs(TRUE)
vcf.fn = args[1]
gds.fn = paste0(args[1], ".gds")
@brentp
brentp / bwa-fb.py
Last active December 27, 2015 09:49
"""
align reads with bwa mem
remove duplicates with samblaster
call variants with freebayes
"""
import sys
import os.path as op
import os
def base(fq):
@brentp
brentp / anno-450k.py
Last active December 26, 2015 21:29
use cruzdb to annotate the illumina 450K manifest file with position relative to gene and to CpG
"""
use cruzdb to annotate the illumina 450K manifest file with
position relative to gene and to CpG
"""
import sys
import os.path as op
from cruzdb import Genome
from toolshed import reader
if not op.exists('hg19.db'):
@brentp
brentp / annotate-genes.py
Last active May 2, 2018 13:30
convert files values from TCGA, which generally contain values for a single sample per file into a matrix with rows of probes and columns of samples
"""
create a BED file with position and strand given the genes from a
TCGA level 3 expression matrix
"""
from cruzdb import Genome
from toolshed import reader
import sys
import os.path as op
matrix = sys.argv[1] # has a column named "probe" that is a refGen name