Last active
August 29, 2015 14:19
-
-
Save sdwfrost/f19949a351f34b2617a3 to your computer and use it in GitHub Desktop.
hellobeagle.jl as an IJulia notebook
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"I've been wanting for quite some time to try out [Julia's](http://julialang.org) ability to call C code, without employing any additional 'glue' code, like that generated by [SWIG](http://www.swig.org). As a first attempt, I thought I would to wrap some example code from [BEAGLE](https://github.com/beagle-dev/beagle-lib), a library that can evaluate the likelihood of sequence evolution on trees - very handy, if you're in the phylogenetics trade. Moreover, BEAGLE has some low-level code that allows it to run on CPUs or GPUs, while providing a relatively simple API. More Julia-specific information about the general approach is given in an excellent [blog post](http://blog.mlin.net/2013/05/a-taste-of-molecular-phylogenetics-in.html) by [Mike Lin](http://www.mlin.net). I've updated Mike's Julia code to work with Julia 0.4, available as a [gist](https://gist.github.com/sdwfrost/6a84af653b38e10aaffd). Even if you're not interested in the specific example, I hope that my learning experience helps others. Many thanks to [Isaiah](https://github.com/ihnorton) and [Marc Suchard](http://faculty.biomath.ucla.edu/msuchard/) for helping me debug my initial attempt.\n", | |
" \n", | |
"For this example, I'll try to replicate the 'Hello World' example in C++ from the BEAGLE [GitHub repo](https://github.com/beagle-dev/beagle-lib/blob/master/examples/standalone/hellobeagle/src/hello.cpp); more information on the API can be found in the [header file](https://github.com/beagle-dev/beagle-lib/blob/master/libhmsbeagle/beagle.h).\n", | |
"\n", | |
"BEAGLE only makes the (fairly standard) likelihood function faster; we can also calculate likelihoods using the ```phangorn``` library in R. Here's what the data look like, in [FASTA](http://en.wikipedia.org/wiki/FASTA) format.\n", | |
"\n", | |
"```\n", | |
">mars\n", | |
"CCGAG-AGCAGCAATGGAT-GAGGCATGGCG\n", | |
">saturn\n", | |
"GCGCGCAGCTGCTGTAGATGGAGGCATGACG\n", | |
">jupiter\n", | |
"GCGCGCAGCAGCTGTGGATGGAAGGATGACG\n", | |
"```\n", | |
"\n", | |
"We now have to assume a tree, with associated branch lengths. In [Newick format](http://en.wikipedia.org/wiki/Newick_format), the tree looks like this.\n", | |
"\n", | |
"```\n", | |
"((mars:0.1,saturn:0.1):0.2,jupiter:0.1);\n", | |
"```\n", | |
"\n", | |
"In R, fitting such models is fairly straightforward; however, as this is meant to be a blog piece on Julia, I'll use [RCall.jl](https://github.com/JuliaStats/RCall.jl)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"using RCall;" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let's run the R script to read in the sequence data and plot out a tree." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAGQCAMAAABBKENmAAAABlBMVEUAAAD///+l2Z/dAAAF5UlEQVR4nO3d4U6aCRRF0Y/3f+mpIiCJN22dK5XlXk101EQ8dw/a/vI4hXb86y8gX6vAuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuALjCowrMK7AuCcN/MGXfQzv/+Gwi2BzFjznRY7j8oT99ec43t718uLyxsvr68d+sic9wC3wa9fT3Yvrf14/9oM96f53z+DTB4GPy1M4ezc4HuX8Zf/2Gbw67nktBl77TH/0YOcfue9+zl4DH3ffon+6Jwz89pw9znkv73x7bh/v/7JV4GcO/OBHfVZPGPjuHz8F/o1nDJy/UGBcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEFxhUYV2BcgXEF/lb2j1jgb6XAnOOX66vLG68fePnz+s7zq89+/qWvs8Cfc1xfvL467t55jn1r/ukHWIEFPh7m9nAfBb7/f+ATO/7nHb7iM30Lj5tzDVvgR3rQnOuP21vo+7cK/EUeNef2l6zT5fX1rQJ/IWVOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnAIPlDkFHihzCjxQ5hR4oMwp8ECZU+CBMqfAA2VOgQfKnMXAmLXD/FvKjnXKYZQd65TDKDvWKYdRdqxTDqPsWKccRtmxTjmMsmOdchhlxzrlMMqOdcphlB3rlMMoO9Yph1F2rFMOo+xYpxxG2bFOOYyyY51yGGXHOuUwyo51ymGUHeuUwyg71imHUXasUw6j7FinHEbZsU45jLJjnXIYZcc65TDKjnXKYZQd65TDKDvWKYdRdqxTDqPsWKccRtmxTjmMsmOdchhlxzrlMMqOdcphlB3rlMMoO9Yph1F2rFMOo+xYpxxG2bFOOYyyY51yGGXHOuUwyo51ymGUHeuUwyg71imHUXasUw6j7FinHEbZsU45jLJjnXIYZcc65TDKjnXKYZQd65TDKDvWKYdRdqxTDqPsWKccRtmxTjmMsmOdchhlxzrlMMqOdcphlB3rlMMoO9Y96jDH8fpQ51+md5zOv1Nv8VfrFXjwsMCn4/JYx/lR395e+gIKPHhw4OPyFD6/c+/RCzx43G85Pd49Z4/bw2/tWPo8+aRb4NszuG/RkPNP3dP5qXz7Ft0zOH+mwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzCuwLgC4wqMKzDuP3Zn6eVipJ/kAAAAAElFTkSuQmCC" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"Loading required package: ape\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"RCall.VecSxp(-536870829,Ptr{Void} @0x0000000007f2d8d0,Ptr{Void} @0x0000000009a072a0,Ptr{Ptr{Void}} @0x0000000009a072c8,19,0)" | |
] | |
}, | |
"execution_count": 2, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"\"library(phangorn)\" |> reval\n", | |
"\"hello <- read.dna('hello.fas',format='fasta')\" |> reval\n", | |
"\"tree <- read.tree(text='((mars:0.1,saturn:0.1):0.2,jupiter:0.1);')\" |> reval;\n", | |
"\"plot(tree)\" |> reval" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can see that there are two nodes in this rooted tree for which we don't directly observe the states - the common ancestor of saturn and mars, and the common ancestor of jupiter, and that of saturn and mars. By calculating the partial likelihoods per site for these two nodes, we can calculate the likelihood of the tree.\n", | |
"\n", | |
"We'll fit a [Jukes-Cantor (1969)](http://en.wikipedia.org/wiki/Models_of_DNA_evolution#JC69_model_.28Jukes_and_Cantor.2C_1969.29.5B1.5D) (otherwise known as a JC69 model) to this, using phangorn's [```pml```](http://www.inside-r.org/packages/cran/phangorn/docs/optim.pml) function." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"1-element Array{Float64,1}:\n", | |
" -84.8524" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"\"hello.fit <- pml(tree,as.phyDat(hello),model='JC69')\n", | |
"logLik(hello.fit)\" |> rcopy" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"phangorn is very nice, but is slow for large datasets, which is where BEAGLE comes in. Let's start with something simple; getting the BEAGLE version, which is a function that takes no parameters, and returns a ```char*```. Note how no arguments are passed, by ```()``` for the C side, and by nothing (except the comma) on the Julia side. To convert a ```char*``` to a string, we use the Julia function ```bytestring```. " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"BEAGLE version 2.1.2" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"@printf \"BEAGLE version %s\" bytestring(ccall((:beagleGetVersion,\"libhmsbeagle\"), Ptr{Cchar}, (),))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now the same function signature again, to get the all-important BEAGLE citation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"BEAGLE citation Using BEAGLE library v2.1.2 for accelerated, parallel likelihood evaluation\n", | |
"2009-2013, BEAGLE Working Group - http://beagle-lib.googlecode.com/\n", | |
"Citation: Ayres et al (2012) Systematic Biology 61: 170-173 | doi:10.1093/sysbio/syr100\n" | |
] | |
} | |
], | |
"source": [ | |
"@printf \"BEAGLE citation %s\" bytestring(ccall((:beagleGetCitation,\"libhmsbeagle\"), Ptr{Cchar}, (),))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"BEAGLE defines lots of constants, but for this example, we'll need only one, ```BEAGLE_OP_NONE```. I had some issues with 32 and 64 bit integers, so it makes sense to enforce types here." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"-1" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"const BEAGLE_OP_NONE = Int32(-1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"BEAGLE thankfully (at least, for wrapping code) doesn't have much in the way of structs. We define types in Julia that mirror those in C. Most should be self explanatory, except ```char*```, which is defined as ```Ptr{Uint8}```." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"immutable BeagleResource\n", | |
" name::Ptr{Uint8}\n", | |
" description::Ptr{Uint8}\n", | |
" supportFlags::Clong\n", | |
" requiredFlags::Clong\n", | |
"end\n", | |
"\n", | |
"immutable BeagleResourceList\n", | |
" list::Ptr{BeagleResource}\n", | |
" length::Cint\n", | |
"end\n", | |
"\n", | |
"immutable BeagleInstanceDetails\n", | |
" resourceNumber::Cint\n", | |
" resourceName::Ptr{Uint8}\n", | |
" implName::Ptr{Uint8}\n", | |
" implDescription::Ptr{Uint8}\n", | |
" flags::Clong\n", | |
"end\n", | |
"\n", | |
"immutable BeagleOperation\n", | |
" destinationPartials::Cint\n", | |
" destinationScaleWrite::Cint\n", | |
" destinationScaleRead::Cint\n", | |
" child1Partials::Cint\n", | |
" child1TransitionMatrix::Cint\n", | |
" child2Partials::Cint\n", | |
" child2TransitionMatrix::Cint\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"BEAGLE can use CPU or GPU resources. The following gets and displays BEAGLE resources, by loading an instance of ```BeagleResourceList```, then iterating through the array, and printing out the name and description of each ```BeagleResource``` (which we convert using ```bytestring```)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"BEAGLE Resource " | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0\n", | |
"Name: CPU\n", | |
"Descriptions: \n" | |
] | |
} | |
], | |
"source": [ | |
"resourceList = ccall( (:beagleGetResourceList, \"libhmsbeagle\"),\n", | |
" Ptr{BeagleResourceList},\n", | |
" (),\n", | |
" )\n", | |
"rl = unsafe_load(resourceList)\n", | |
"rl.length\n", | |
"resources = pointer_to_array(rl.list,rl.length)\n", | |
"for i in 1:rl.length\n", | |
" @printf \"BEAGLE Resource %i\\n\" i-1\n", | |
" @printf \"Name: %s\\n\" bytestring(resources[i].name)\n", | |
" @printf \"Descriptions: %s\\n\" bytestring(resources[i].description)\n", | |
"end" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Here are the toy data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"mars = \"CCGAG-AGCAGCAATGGAT-GAGGCATGGCG\"\n", | |
"saturn = \"GCGCGCAGCTGCTGTAGATGGAGGCATGACG\"\n", | |
"jupiter = \"GCGCGCAGCAGCTGTGGATGGAAGGATGACG\";" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The likelihood depends on the number of patterns i.e. unique columns in the alignment. Here, we are lazy, and simply take the length of the string." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"31" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"nPatterns = length(mars)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We initialise the BEAGLE instance, and the return information; note that ```retInfo``` is in an array to allow passing it as a ```Ptr{BeagleInstanceDetails}```. In the C++ code, ```retInfo``` is created by ```BeagleInstanceDetails()``` but that isn't allowed in Julia by default, thanks to multiple dispatch. Look at the [original C++ source](https://github.com/beagle-dev/beagle-lib/blob/master/examples/standalone/hellobeagle/src/hello.cpp) for a description of the arguments of this function." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"retInfo = [BeagleInstanceDetails(0,0,0,0,0)]\n", | |
"instance = ccall((:beagleCreateInstance, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
" (Cint,Cint,Cint,Cint,Cint,Cint,Cint,Cint,Cint,Ptr{Int},Cint,Clong,Clong,Ptr{BeagleInstanceDetails}),\n", | |
" 3,2,3,4,nPatterns,1,4,1,0,C_NULL,0,0,0,retInfo)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In order to get our data into BEAGLE, we need to map the bases in our sequences to ```Int32```; initially, I had these as ```Int64```, trusting that Julia would do the conversion, but that didn't produce the correct answer." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"Dict{Char,Int32} with 9 entries:\n", | |
" 'G' => 2\n", | |
" 'g' => 2\n", | |
" 'C' => 1\n", | |
" 'c' => 1\n", | |
" 't' => 3\n", | |
" 'A' => 0\n", | |
" 'a' => 0\n", | |
" 'T' => 3\n", | |
" '-' => 4" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table = Dict{Char,Int32}(\n", | |
" 'A' => 0,\n", | |
" 'C' => 1,\n", | |
" 'G' => 2,\n", | |
" 'T' => 3,\n", | |
" 'a' => 0,\n", | |
" 'c' => 1,\n", | |
" 'g' => 2,\n", | |
" 't' => 3,\n", | |
" '-' => 4,\n", | |
" )" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Julia can map states nicely with list comprehensions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"marsStates = Int32[table[b] for b in mars]\n", | |
"saturnStates = Int32[table[b] for b in saturn]\n", | |
"jupiterStates = Int32[table[b] for b in jupiter];" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As there are no ambiguous states in the data, we now assign the mapped states to the tips of the tree; these are placed into three buffers, 0, 1, and 2. Our states are already arrays, and are mapped to ```Ptr{Cint}```." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"ccall((:beagleSetTipStates, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Cint,Ptr{Cint}),\n", | |
" instance,0,marsStates)\n", | |
"ccall((:beagleSetTipStates, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Cint,Ptr{Cint}),\n", | |
" instance,1,saturnStates)\n", | |
"ccall((:beagleSetTipStates, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Cint,Ptr{Cint}),\n", | |
" instance,2,jupiterStates);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In this case, we haven't compressed the patterns, so we simply have a vector of ones for weighting each pattern." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"patternWeights = Array(Float64,nPatterns)\n", | |
"fill!(patternWeights,1.0)\n", | |
"ccall((:beagleSetPatternWeights, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Ptr{Cdouble}),\n", | |
" instance,patternWeights)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There are lots of [models of DNA evolution](http://en.wikipedia.org/wiki/Models_of_DNA_evolution). We assume a JC69 model, with equal frequencies of nucleotides." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"freqs=Float64[0.25,0.25,0.25,0.25]\n", | |
"ccall((:beagleSetStateFrequencies, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Cint,Ptr{Cdouble}),\n", | |
" instance,0,freqs)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"It is common to allow sites to evolve at different rates (under a constraint on the mean rate). In this toy example, we assume only a single category of rates." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"weights = Float64[1.0]\n", | |
"rates = Float64[1.0]\n", | |
"ccall((:beagleSetCategoryWeights, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Cint,Ptr{Cdouble}),\n", | |
"instance,0,weights)\n", | |
"ccall((:beagleSetCategoryRates, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Ptr{Cdouble}),\n", | |
" instance,rates)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"There are two ways to set the transition matrix in BEAGLE, directly, or by setting the eigenvector decomposition of the transition matrix; the below illustrates the latter." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"evec=Float64[1.0,2.0,0.0,0.5,\n", | |
" 1.0,-2.0,0.5,0.0,\n", | |
" 1.0,2.0,0.0,-0.5,\n", | |
" 1.0,-2.0,-0.5,0.0\n", | |
"]\n", | |
"ivec=Float64[\n", | |
" 0.25,0.25,0.25,0.25,\n", | |
" 0.125,-0.125,0.125,-0.125,\n", | |
" 0.0,1.0,0.0,-1.0,\n", | |
" 1.0 ,0.0,-1.0,0.0\n", | |
"]\n", | |
"ev = Float64[0.0, -1.3333333333333333, -1.3333333333333333, -1.3333333333333333]\n", | |
"ccall((:beagleSetEigenDecomposition, \"libhmsbeagle\"),\n", | |
" Int,\n", | |
"(Cint,Cint,Ptr{Cdouble},Ptr{Cdouble},Ptr{Cdouble}),\n", | |
" instance,0,evec,ivec,ev)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A list of indices and edge lengths, which are used to tell BEAGLE which edge length goes with which node." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"nodeIndices = Int32[0,1,2,3]\n", | |
"edgeLengths = Float64[0.1,0.1,0.2,0.1];" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The edge lengths and node indices are then used to update the transition matrix." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"ccall((:beagleUpdateTransitionMatrices, \"libhmsbeagle\"),\n", | |
" Cint,\n", | |
"(Cint,Cint,Ptr{Cint},Ptr{Cint},Ptr{Cint},Ptr{Cdouble},Cint),\n", | |
" instance,0,nodeIndices,C_NULL,C_NULL,edgeLengths,4);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The following can be used to show the resulting transition matrix. Although matrices are flattened in the BEAGLE representation, passing a matrix seemed to work just fine." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"WARNING: [a] concatenation is deprecated; use [a;] instead\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
" in depwarn at ./deprecated.jl:40\n", | |
" in oldstyle_vcat_warning at ./abstractarray.jl:26\n", | |
" in vect at abstractarray.jl:29\n", | |
" in include_string at loading.jl:98\n", | |
" in execute_request_0x535c5df2 at /home/simon/.julia/v0.4/IJulia/src/execute_request.jl:157\n", | |
" in eventloop at /home/simon/.julia/v0.4/IJulia/src/IJulia.jl:123\n", | |
" in anonymous at task.jl:361\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"4x4 Array{Float64,2}:\n", | |
" 0.90638 0.0312067 0.0312067 0.0312067\n", | |
" 0.0312067 0.90638 0.0312067 0.0312067\n", | |
" 0.0312067 0.0312067 0.90638 0.0312067\n", | |
" 0.0312067 0.0312067 0.0312067 0.90638 " | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"qmat = [Array(Float64,4,4)]\n", | |
"ccall((:beagleGetTransitionMatrix,\"libhmsbeagle\"),Cint,(Cint,Cint,Ptr{Cdouble}),instance,0,qmat)\n", | |
"qmat" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now, the important bit; the structure of the tree determines how partial likelihoods are passed between buffers; first peel node 0 and 1 to calculate the per-site partial likelihoods, and store them in buffer 3, then peel node 2 and buffer 3 and store the per-site partial likelihoods in buffer 4." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"operations = BeagleOperation[\n", | |
" BeagleOperation(3, BEAGLE_OP_NONE, BEAGLE_OP_NONE, 0, 0, 1, 1),\n", | |
" BeagleOperation(4, BEAGLE_OP_NONE, BEAGLE_OP_NONE, 2, 2, 3, 3)\n", | |
" ]\n", | |
"ccall((:beagleUpdatePartials, \"libhmsbeagle\"),\n", | |
" Cint,\n", | |
" (Cint,Ptr{BeagleOperation},Cint,Cint),\n", | |
"instance,operations,2,BEAGLE_OP_NONE)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can now list the partial likelihoods for buffer 3 (we already set the states at the tips, buffers 0, 1 and 2)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"124-element Array{Float64,1}:\n", | |
" 0.000973856\n", | |
" 0.0282851 \n", | |
" 0.0282851 \n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.821525 \n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.821525 \n", | |
" 0.000973856\n", | |
" 0.0282851 \n", | |
" ⋮ \n", | |
" 0.0282851 \n", | |
" 0.000973856\n", | |
" 0.0282851 \n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.821525 \n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.000973856\n", | |
" 0.821525 \n", | |
" 0.000973856" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"lenpartial = 4*31*1 # stateCount * patternCount * categoryCount\n", | |
"partials3 = [Array(Float64,lenpartial)]\n", | |
"ccall((:beagleGetPartials, \"libhmsbeagle\"),\n", | |
" Cint,\n", | |
" (Cint,Cint,Cint,Ptr{Cdouble}),\n", | |
" instance,3,BEAGLE_OP_NONE,partials3\n", | |
")\n", | |
"partials3" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
"WARNING: [a] concatenation is deprecated; use [a;] instead\n" | |
] | |
}, | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"name": "stderr", | |
"output_type": "stream", | |
"text": [ | |
" in depwarn at ./deprecated.jl:40\n", | |
" in oldstyle_vcat_warning at ./abstractarray.jl:26\n", | |
" in vect at abstractarray.jl:29\n", | |
" in include_string at loading.jl:98\n", | |
" in execute_request_0x535c5df2 at /home/simon/.julia/v0.4/IJulia/src/execute_request.jl:157\n", | |
" in eventloop at /home/simon/.julia/v0.4/IJulia/src/IJulia.jl:123\n", | |
" in anonymous at task.jl:361\n" | |
] | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"124-element Array{Float64,1}:\n", | |
" 0.000156737\n", | |
" 0.00155544 \n", | |
" 0.0219142 \n", | |
" 0.000156737\n", | |
" 0.00155544 \n", | |
" 0.613969 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" 0.613969 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" ⋮ \n", | |
" 0.0219142 \n", | |
" 0.000156737\n", | |
" 0.00155544 \n", | |
" 0.000156737\n", | |
" 0.00155544 \n", | |
" 0.613969 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" 0.00155544 \n", | |
" 0.613969 \n", | |
" 0.00155544 " | |
] | |
}, | |
"execution_count": 24, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"partials4 = [Array(Float64,lenpartial)]\n", | |
"ccall((:beagleGetPartials, \"libhmsbeagle\"),\n", | |
" Cint,\n", | |
" (Cint,Cint,Cint,Ptr{Cdouble}),\n", | |
" instance,4,BEAGLE_OP_NONE,partials4\n", | |
")\n", | |
"partials4" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The total log likelihood of the tree is that at the root; to obtain this value, we generate a reference to a ```Cdouble```, then pass it to ```beagleCalculateRootLogLikelihoods```." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"0" | |
] | |
}, | |
"execution_count": 25, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"logL = Ref{Cdouble}(0.0)\n", | |
"rootIndex = Int32[4]\n", | |
"categoryWeightIndex = Int32[0]\n", | |
"stateFrequencyIndex = Int32[0]\n", | |
"cumulativeScaleIndex = Int32[BEAGLE_OP_NONE];\n", | |
"ccall((:beagleCalculateRootLogLikelihoods, \"libhmsbeagle\"), Cint, (Cint,Ptr{Cint},Ptr{Cint},Ptr{Cint},Ptr{Cint},Cint,Ref{Cdouble}), instance,rootIndex,categoryWeightIndex,stateFrequencyIndex,cumulativeScaleIndex,1,logL)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The returned log likelihood (accessed via ```logL[]```) is the same as the C++ version." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"-84.85235823277961" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"logL[]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We can also examine the site-by-site log likelihoods." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
}, | |
{ | |
"data": { | |
"text/plain": [ | |
"31-element Array{Float64,1}:\n", | |
" -5.12507\n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -5.12507\n", | |
" -1.86653\n", | |
" -1.75738\n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -5.12507\n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -5.12507\n", | |
" ⋮ \n", | |
" -1.75738\n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -4.0657 \n", | |
" -1.86653\n", | |
" -4.0657 \n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -1.86653\n", | |
" -5.12507\n", | |
" -1.86653\n", | |
" -1.86653" | |
] | |
}, | |
"execution_count": 27, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sitelike=Array(Float64,nPatterns)\n", | |
"ccall((:beagleGetSiteLogLikelihoods,\"libhmsbeagle\"),Cint,(Cint,Ptr{Cdouble}),instance,sitelike)\n", | |
"sitelike" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally, some cleanup." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "" | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"ccall((:beagleFinalizeInstance,\"libhmsbeagle\"), Cint, (Cint,),instance)\n", | |
"ccall((:beagleFinalize,\"libhmsbeagle\"),Cint,(),);" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"So, that's it. As most of this was written during making dinner for the children, and getting them in the bath, I must say, I was impressed with how straightforward this was, especially after quite a bit of experience with tools like SWIG. Now on to the [Phylogenetic Likelihood Library](http://www.libpll.org/), which has rather more complex structs, so I'll probably work on getting [```Clang.jl```](https://github.com/ihnorton/Clang.jl) working to generate most of the boilerplate code from the header.\n", | |
"\n", | |
"All the above is available as a [gist](https://gist.github.com/sdwfrost/f19949a351f34b2617a3), or via nbviewer " | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Julia 0.4.0-dev", | |
"language": "julia", | |
"name": "julia 0.4" | |
}, | |
"language_info": { | |
"name": "julia", | |
"version": "0.4.0" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment