David Winter dwinter

🐢

I may be slow to respond.

Computational evolutionary biology using R, Python and (sometimes) C++

dwinter / na.r

Created March 18, 2013 04:48

'NA' is a gene symbol, which can cause problems

	chars <- c("ABC", NA, "DEF")
	(urls <- paste("web-api&id=", chars, sep=""))
	## [1] "my-fav-web-api/id=ABC" "my-fav-web-api/id=NA" "my-fav-web-api/id=DEF"

dwinter / ggplot_ex.R

Created February 16, 2013 21:27

	library(ggplot2)

	df0 <- data.frame(x=rnorm(100), y=rnorm(100), grp=factor(rep(letters[1:2], each=50)))
	p <- ggplot(df0, aes(x,y, colour=grp, alpha=y))
	p + geom_point()



	library(RColorBrewer)

dwinter / gsi.R

Created July 24, 2012 23:36

calculate gsi in R

	#calculate Cummings et al (2008) _gsi_ - a meaure of the exlusivity of a
	#predefiend group of leaves in a phylogenetic tree


	#example
	#
	# tr <- rtree(10)
	# grp <- paste("t", 1:5, sep="")
	# gsi(tr, grp)

dwinter / Parse_multitree_paml.py

Created June 1, 2012 10:37

Parse results for a multi-tree PAML run

	#
	# Code to parse the interesting bits of a PAML results file that contains
	# statistics for several trees. Note, this approach contains some hacks that
	# are probably specific to multiple-tree CODEML files, Biopython has a module
	# for handling other PAML outputs, and it probably a better starting point for
	# 'normal' (single tree) input files.
	#
	# TODO - should clean up handlng of resifues - use @property decorator and
	# ._functions to simplify writing/reading.
	#

dwinter / SummariseBeast.py

Created May 30, 2012 05:27

Summarise Beast output from the command line

	"""
	Summarise a variable in a BEAST logfile

	arguments:
	-h, --help show this help message and exit
	-f or --file string whats the name of the BEAST file
	-v or --variable string value you want to summarise
	-b or --burnin int number of states to ignore
	-g export graphs to summarise the sample?
	-l export a logfile for just this variable

dwinter / rentrez.r

Created February 11, 2012 01:19

Dealing with entrez in R


	entrez_search <- function(dbase, term, retmax=6,...){
	base_url <- "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=%s&term=%s&retmax=%i"
	search <- sprintf(base_url, dbase, term, retmax)
	raw_result <- getURL(search)
	ids <- unlist(getNodeSet(xmlParse(raw_result), "//Id", fun=xmlValue))
	return(as.integer(ids))
	}

	entrez_fetch <- function(dbase, ids, format, ...){

dwinter / fungi_step1.py

Created September 3, 2011 04:24

An online fungal foray

	from Bio import Entrez

	#Let NCBI know who you are in case you do something stupid :)
	Entrez.email = '[email protected]'

	search_s ='"ectomycorrhizal root tip" AND "New Zealand"'
	handle = Entrez.esearch(db='nucleotide', term=search_s, retmax=100)
	ids = Entrez.read(handle)['IdList']

	ids[:5]

dwinter / k-means.py

Created April 28, 2011 09:05

	import numpy as np
	from scipy import cluster
	from matplotlib import pyplot

	#fake some data
	tests = np.reshape( np.random.uniform(0,100,60), (30,2) )

	#plot remaining variance for each value for 'k' between 1,10
	initial = [cluster.vq.kmeans(tests,i) for i in range(1,10)]
	pyplot.plot([var for (cent,var) in initial])

dwinter / score_pairwise.py

Created April 21, 2011 05:37

	"""
	Code written to answer stackoverflow question on substitution matrices:
	http://stackoverflow.com/questions/5686211/
	"""

	from Bio.SubsMat import MatrixInfo

	def score_match(pair, matrix):
	"""
	Return score for a given pair of residues in a give matrix.

dwinter / ORF_finder.py

Created April 21, 2011 04:50 — forked from chapmanb/gist:862240

	"""
	Code writted to answer this challenge at Biostar:
	http://biostar.stackexchange.com/questions/5902/

	(This code includes improvements from Brad Chapman)
	"""

	class ORFFinder:
	"""Find the longest ORF in a given sequence
	"seq" is a string, if "start" is not provided any codon can be the start of