Skip to content

Instantly share code, notes, and snippets.

View dwinter's full-sized avatar
🐢
I may be slow to respond.

David Winter dwinter

🐢
I may be slow to respond.
View GitHub Profile
@dwinter
dwinter / na.r
Created March 18, 2013 04:48
'NA' is a gene symbol, which can cause problems
chars <- c("ABC", NA, "DEF")
(urls <- paste("web-api&id=", chars, sep=""))
## [1] "my-fav-web-api/id=ABC" "my-fav-web-api/id=NA" "my-fav-web-api/id=DEF"
library(ggplot2)
df0 <- data.frame(x=rnorm(100), y=rnorm(100), grp=factor(rep(letters[1:2], each=50)))
p <- ggplot(df0, aes(x,y, colour=grp, alpha=y))
p + geom_point()
library(RColorBrewer)
@dwinter
dwinter / gsi.R
Created July 24, 2012 23:36
calculate gsi in R
#calculate Cummings et al (2008) _gsi_ - a meaure of the exlusivity of a
#predefiend group of leaves in a phylogenetic tree
#example
#
# tr <- rtree(10)
# grp <- paste("t", 1:5, sep="")
# gsi(tr, grp)
@dwinter
dwinter / Parse_multitree_paml.py
Created June 1, 2012 10:37
Parse results for a multi-tree PAML run
#
# Code to parse the interesting bits of a PAML results file that contains
# statistics for several trees. Note, this approach contains some hacks that
# are probably specific to multiple-tree CODEML files, Biopython has a module
# for handling other PAML outputs, and it probably a better starting point for
# 'normal' (single tree) input files.
#
# TODO - should clean up handlng of resifues - use @property decorator and
# ._functions to simplify writing/reading.
#
@dwinter
dwinter / SummariseBeast.py
Created May 30, 2012 05:27
Summarise Beast output from the command line
"""
Summarise a variable in a BEAST logfile
arguments:
-h, --help show this help message and exit
-f or --file string whats the name of the BEAST file
-v or --variable string value you want to summarise
-b or --burnin int number of states to ignore
-g export graphs to summarise the sample?
-l export a logfile for just this variable
@dwinter
dwinter / rentrez.r
Created February 11, 2012 01:19
Dealing with entrez in R
entrez_search <- function(dbase, term, retmax=6,...){
base_url <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=%s&term=%s&retmax=%i"
search <- sprintf(base_url, dbase, term, retmax)
raw_result <- getURL(search)
ids <- unlist(getNodeSet(xmlParse(raw_result), "//Id", fun=xmlValue))
return(as.integer(ids))
}
entrez_fetch <- function(dbase, ids, format, ...){
@dwinter
dwinter / fungi_step1.py
Created September 3, 2011 04:24
An online fungal foray
from Bio import Entrez
#Let NCBI know who you are in case you do something stupid :)
Entrez.email = '[email protected]'
search_s ='"ectomycorrhizal root tip" AND "New Zealand"'
handle = Entrez.esearch(db='nucleotide', term=search_s, retmax=100)
ids = Entrez.read(handle)['IdList']
ids[:5]
import numpy as np
from scipy import cluster
from matplotlib import pyplot
#fake some data
tests = np.reshape( np.random.uniform(0,100,60), (30,2) )
#plot remaining variance for each value for 'k' between 1,10
initial = [cluster.vq.kmeans(tests,i) for i in range(1,10)]
pyplot.plot([var for (cent,var) in initial])
"""
Code written to answer stackoverflow question on substitution matrices:
http://stackoverflow.com/questions/5686211/
"""
from Bio.SubsMat import MatrixInfo
def score_match(pair, matrix):
"""
Return score for a given pair of residues in a give matrix.
"""
Code writted to answer this challenge at Biostar:
http://biostar.stackexchange.com/questions/5902/
(This code includes improvements from Brad Chapman)
"""
class ORFFinder:
"""Find the longest ORF in a given sequence
"seq" is a string, if "start" is not provided any codon can be the start of