David Winter dwinter

#What's the easiest way to extract license info from PMC

As a test, play with a 50-paper request from PMC

library(rentrez)
search <- entrez_search(db="pmc", term="Tetrahymena", retmax=50)

#Half-time report from the tree-for-all hackathon

Apart from being a great deal for fun, the R projects at the OpenTreeOfLife hackathon have been making some good progress.

##Introducing rotl

Francois Michonneau, Jeremy Brown and I have been working on a package that wraps the OpenTree's various data APIs to allow users to search for trees, taxa and phylogenetic studies and pull down trees into their R sessions. Although we've started by focusing on low-level functions that wrap a single API call, there are all ready a few interesting functions (check out the repo's README for a couple of examples).

We've been working with Python and Ruby developers to generate complientary libraries, and hope to use the rest of the week to finish some convience functions that wrap up multiple calls to the OpenTree APIs to achiev

For the most part the pipeline below follows the GATK's "best pratctice" advice for calling variants from short reads, which provides more documentation for the process.

For the Tetrahymena and Plasmodium projects these steps have been controlled by a set of shell scripts (see /home/david/malaria/scripts for the latest iteration) that automate various steps along the way. I usually run the scripts with all output redirected to a log file so anythign interesting is recorded:./script.sh &> logs/script.log

##Align reads (Bowtie 2)

Prior to alignment, you need to create an index of the reference genome:

$ bowtie2-build -f [path to ref] [index_file_stem]

#rOpenSci at NESCent Open Tree of Life Hackathon

The Open Tree of Life project aims to synthesize our combined knowledge of how organisms relate to each other, and make the results available to anyone who wants to use them. At present, the project contains data from more than 4 000 published phylogenies, which combine with other data sources to make a tree that covers 2.5 million species.

In September, the Open Tree of Life team are holding a hackathon to develop tools that use the project's web services to extract, annotate and add data. We are excited to say that Francois Michonneau and I will be attending the hackathon, where they plan to work with Joseph W. Brown on an R package that allows users to interact with the Open Tree data.

Joseph has already written a good deal of the code for this package, so a key goal for the

#An rOpenSci library for the Open Tree of Life API.

rOpenSci is a project that allows programatic access to data repostories in the popular R programming language. rOpenSci already provides libraries to query the phylogeny databases treeBASE and Phylomatic, as well as data resources provided by NCBI and dryad . A library wrapping the Open Tree of Life would be an excellent addition to the rOpenSci project and hopefully increase the availability of the Open Tree of Life data.

I imagine the first step in creating such a library would be to faithfully map

	double NormalMADensity(std::vector<double> obs, double a, double Va, double Ve, double Ut, bool log){
	//starting values for prob and res are for special case of k=0
	int n = obs.size();
	std::vector<double> res (n, 0.0);
	double running_prob = exp(-Ut);
	for(size_t i = 0; i< n; i++){
	res.push_back( (exp(-(pow(obs[i],2)/(2Ve))) / (sqrt(2M_PI) * sqrt(Ve))) * running_prob ) ;
	}
	uint64_t kfac = 1;
	uint16_t k= 1;

	#!/usr/bin/env python

	import sys
	import subprocess
	import collections
	import random

	from StringIO import StringIO

	from Bio import SeqIO

	x <- seq(0,5,0.05)
	plot(x, dgamma(x, shape=10, scale=0.1), type='l')
	lines(x, dgamma(x, shape=2, scale=0.5), col='red')
	lines(x, dgamma(x, shape=1, scale=1), col='blue')