Skip to content

Instantly share code, notes, and snippets.

View dwinter's full-sized avatar
🐢
I may be slow to respond.

David Winter dwinter

🐢
I may be slow to respond.
View GitHub Profile
@dwinter
dwinter / Parse_codeml.ipynb
Created June 14, 2016 16:35
Parse codeml results with Biopython
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dwinter
dwinter / dnds_one.sh
Created June 13, 2016 21:35
Parrelizable codeml?
#for seq in alignments/ENSG*.best.fas
#do
# There is some evidence that forcing trees to fit the spp tree will inflate
# dN/dS estimates. Better to estimate the tree every time. At the same time,
# label the human branch and delet branch lengths which aren't useful
gene=`basename ${seq%%.*}`
raxml -s ${seq} -T 2 -p 1231 -m GTRGAMMA -n ${gene}_delete_me
sed -e "s/:0.[0-9]*//g" -e "s/sapiens/sapiens#1/g" RAxML_bestTree.${gene}_delete_me >${gene}.tr
rm *.${gene}_delete_me
```r
library(rotl)
tnrs_match_names("Laminaria saccharina")
```
```
search_string unique_name approximate_match ott_id is_synonym
1 laminaria saccharina Saccharina latissima FALSE 690474 TRUE
is_deprecated number_matches
1 FALSE 1
@dwinter
dwinter / ape_eg.Rmd
Last active March 10, 2016 21:15
ape_nodes
```r
tr <- rtree(10)
tr$node.label <- paste0("Node_lab_", Ntip(tr):length(tr$edge.length)+1)
write.tree(tr)
```
```sh
[1] "((t1:0.9895939711,t7:0.7082487224)Node_lab_12:0.5267008557,(((t5:0.6487850151,t6:0.1765917828)Node_lab_15:0.6776126698,(t3:0.2011363888,t2:0.6406570079)Node_lab_16:0.4713614753)Node_lab_14:0.9677321229,(t8:0.1978681912,((t4:0.6891832552,t9:0.3251808353)Node_lab_19:0.4984181079,t10:0.7551706994)Node_lab_18:0.3928663374)Node_lab_17:0.4166005277)Node_lab_13:0.8827202818)Node_lab_11;"
```
Confirm the mapping is right:
@dwinter
dwinter / parse_fq.md
Last active June 6, 2024 18:22
Parse fastqc outputs

Fastqc is a program to perform some basic quality checks on fastq files. It makes nice html reports for a given file, but (as far as I can tell) doesn't provide a straightfowrard way to compare the results across files (which might represent different library preps, sequencing lanes or samples).

Here is the (really pretty hacky) solution to aggregating these stats that I came up with. This all assumes that you have a directory where reports for each fastq file are in a subdirectories containing the reports with names ./library_name.L001.R1.fastqc/fastqc_data.txt. We will then use regular expressions to match just those parts of the file we care about.

import os
import re

#percent sequences left after de_dup
re_dup = re.compile('Total Deduplicated Percentage\t(\d\d\.\d)')
@dwinter
dwinter / pw_dist.rmd
Last active February 19, 2016 04:12
among-group distances
```r
inter_grp_dist <- function(combo, M, group_map){
vals <- M[group_map == combo[1], group_map == combo[2]]
list( combo=paste(combo, collapse="-"), mean=mean(vals), var=var(as.vector(vals)) )
}
within_grp_dist <- function(grp, M , group_map){
vals <- M[group_map == grp, group_map==grp]
vals <- vals[lower.tri(vals)]
list( combo=grp, mean=mean(vals), var=var(vals) )
@dwinter
dwinter / eg.Rmd
Last active February 6, 2016 00:16
TAXIDs and NCBI genome?
#Won't work by cross linking
```r
tax_search <- entrez_search(db="taxonomy", term="Acetobacter[ORGN] AND genus[RANK]")
linked_recs <- entrez_link(dbfrom="taxonomy", db="genome", id=tax_search$ids)
linked_recs$links$taxonomy_genome
```
```
#[1] "18005"
```
def square_docs(f):
f.__doc__ = "Square it"
return(f)
@square_docs
def square(x):
return(x**2)
help(square)
#Type: function
@dwinter
dwinter / rentrez_1.Rmd
Created September 23, 2015 02:26
rentrez release blog post
#Rentrez 1.0 released
A new version of `rentrez`, our package for the NCBI's EUtils API, is making
it's way around the CRAN mirrors. This release represents a substantial
mprovement to `rentrez`, including a [new vignette](https://cran.r-project.org/web/packages/rentrez/vignettes/rentrez_tutorial.html)
that documents the whole package.
This posts describes some of the new things in `rentrez`, and gives us a chance
to thank some of the people that have contriuted to this package's development.
@dwinter
dwinter / plot_tree.r
Created August 10, 2015 16:13
Flag phylogeny
traits <- read.csv("flag_traits.csv")
dist_mat <- dist(traits, "manhattan")
tr <- ape::nj(dist_mat)
plot(tr)
##Rooting the tree (arbitrarily) will make a nicer plot
#rooted <- ape::root(tr, node=50)
#plot(rooted)