Skip to content

Instantly share code, notes, and snippets.

View dwinter's full-sized avatar
🐢
I may be slow to respond.

David Winter dwinter

🐢
I may be slow to respond.
View GitHub Profile
@dwinter
dwinter / pw_dist.rmd
Last active February 19, 2016 04:12
among-group distances
```r
inter_grp_dist <- function(combo, M, group_map){
vals <- M[group_map == combo[1], group_map == combo[2]]
list( combo=paste(combo, collapse="-"), mean=mean(vals), var=var(as.vector(vals)) )
}
within_grp_dist <- function(grp, M , group_map){
vals <- M[group_map == grp, group_map==grp]
vals <- vals[lower.tri(vals)]
list( combo=grp, mean=mean(vals), var=var(vals) )
@dwinter
dwinter / parse_fq.md
Last active June 6, 2024 18:22
Parse fastqc outputs

Fastqc is a program to perform some basic quality checks on fastq files. It makes nice html reports for a given file, but (as far as I can tell) doesn't provide a straightfowrard way to compare the results across files (which might represent different library preps, sequencing lanes or samples).

Here is the (really pretty hacky) solution to aggregating these stats that I came up with. This all assumes that you have a directory where reports for each fastq file are in a subdirectories containing the reports with names ./library_name.L001.R1.fastqc/fastqc_data.txt. We will then use regular expressions to match just those parts of the file we care about.

import os
import re

#percent sequences left after de_dup
re_dup = re.compile('Total Deduplicated Percentage\t(\d\d\.\d)')
@dwinter
dwinter / ape_eg.Rmd
Last active March 10, 2016 21:15
ape_nodes
```r
tr <- rtree(10)
tr$node.label <- paste0("Node_lab_", Ntip(tr):length(tr$edge.length)+1)
write.tree(tr)
```
```sh
[1] "((t1:0.9895939711,t7:0.7082487224)Node_lab_12:0.5267008557,(((t5:0.6487850151,t6:0.1765917828)Node_lab_15:0.6776126698,(t3:0.2011363888,t2:0.6406570079)Node_lab_16:0.4713614753)Node_lab_14:0.9677321229,(t8:0.1978681912,((t4:0.6891832552,t9:0.3251808353)Node_lab_19:0.4984181079,t10:0.7551706994)Node_lab_18:0.3928663374)Node_lab_17:0.4166005277)Node_lab_13:0.8827202818)Node_lab_11;"
```
Confirm the mapping is right:
```r
library(rotl)
tnrs_match_names("Laminaria saccharina")
```
```
search_string unique_name approximate_match ott_id is_synonym
1 laminaria saccharina Saccharina latissima FALSE 690474 TRUE
is_deprecated number_matches
1 FALSE 1
@dwinter
dwinter / dnds_one.sh
Created June 13, 2016 21:35
Parrelizable codeml?
#for seq in alignments/ENSG*.best.fas
#do
# There is some evidence that forcing trees to fit the spp tree will inflate
# dN/dS estimates. Better to estimate the tree every time. At the same time,
# label the human branch and delet branch lengths which aren't useful
gene=`basename ${seq%%.*}`
raxml -s ${seq} -T 2 -p 1231 -m GTRGAMMA -n ${gene}_delete_me
sed -e "s/:0.[0-9]*//g" -e "s/sapiens/sapiens#1/g" RAxML_bestTree.${gene}_delete_me >${gene}.tr
rm *.${gene}_delete_me
@dwinter
dwinter / Parse_codeml.ipynb
Created June 14, 2016 16:35
Parse codeml results with Biopython
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
```
---
name: rotl-paper-published
layout: post
title: "rotl paper published"
date: 2016-18-02
authors:
- name: Francois Michonneau
url: http://francoismichonneau.net/
- name: Joseph Brown
obs unit magnitude
1 1 -0.61
2 1 -1.43
3 1 -1.09
4 1 -1.13
5 1 -0.66
6 1 -0.36
7 1 -0.73
8 1 -0.89
9 1 -0.69
@dwinter
dwinter / ideograms.r
Last active September 8, 2022 11:35
.one_chrom_outline <- function(chrom_name, len, offset, width, centro_s, centro_e, notch_prop=0.6){
one_side <- c(0, centro_s-offset, (centro_s + centro_e)/2, centro_e+offset, len)
wd = width/2
data.frame(
chrom = chrom_name,
x = c(one_side, rev(one_side), 0),
y = c(wd, wd, notch_prop * wd, wd, wd, -wd, -wd, -notch_prop * wd, -wd, -wd, wd)
)
@dwinter
dwinter / sample_disk.r
Last active January 9, 2019 21:34
uniform sampling on a (half) disk
sample_half_disk <- function(n,radius) {
prop_radius <- runif(n)
angles <- runif(n,0,360)
data.frame( x= sqrt(prop_radius) * cos(angles) * radius,
y = -abs(sqrt(prop_radius) * sin(angles) * radius))
}
plot(sample_half_disk(1e4,300))