Skip to content

Instantly share code, notes, and snippets.

@bede
Last active February 11, 2016 11:04
Show Gist options
  • Save bede/e4b59c6399d3b1ebbd81 to your computer and use it in GitHub Desktop.
Save bede/e4b59c6399d3b1ebbd81 to your computer and use it in GitHub Desktop.
OneCodex real-time search API
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# OneCodex 'real-time' *k*-mer search API\n",
"OneCodex has a mature asynchronous API for dealing with whole files, but they also provide an undocumented API that returns lowest common ancestor (LCA) results for a single query from its in-memory 31mer database. It's millisecond fast, so the round trip to the US west coast is the limiting factor in terms of speed.\n",
"\n",
"You'll need to [register](https://app.onecodex.com/register) an account to receive your [API key](https://app.onecodex.com/settings)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{'elapsed_secs': '0.0001',\n",
" 'k': 31,\n",
" 'n_hits': 40,\n",
" 'n_lookups': 40,\n",
" 'tax_id': 11676}"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import json\n",
"import requests\n",
"\n",
"onecodex_api_key = 'YOUR_API_KEY'\n",
"test_sequence = 'TAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAAACATCAGAAGGCTGTAGACAAATACTGGGACAGCT'\n",
"\n",
"def onecodex_lca(seq, onecodex_api_key):\n",
" url = 'https://app.onecodex.com/api/v0/search'\n",
" payload = {'sequence':str(seq)}\n",
" auth = requests.auth.HTTPBasicAuth(onecodex_api_key, '')\n",
" response = requests.post(url, payload, auth=auth, timeout=5)\n",
" result = json.loads(response.text)\n",
" return result\n",
"\n",
"lca = onecodex_lca(test_sequence, onecodex_api_key)\n",
"lca"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One can easily look up the taxid using an EBI API"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"('Human immunodeficiency virus 1',\n",
" ['Viruses',\n",
" 'Retro-transcribing viruses',\n",
" 'Retroviridae',\n",
" 'Orthoretrovirinae',\n",
" 'Lentivirus',\n",
" 'Primate lentivirus group'])"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def ebi_taxid_to_lineage(tax_id):\n",
" url = 'http://www.ebi.ac.uk/ena/data/taxonomy/v1/taxon/tax-id/{}'\n",
" if tax_id == 0 or tax_id == 1:\n",
" return None, None\n",
" response = requests.get(url.format(tax_id), timeout=5)\n",
" result = json.loads(response.text)\n",
" sciname = result['scientificName']\n",
" taxonomy = [x for x in result['lineage'].split('; ') if x]\n",
" return sciname, taxonomy\n",
"\n",
"taxon = ebi_taxid_to_lineage(lca['tax_id'])\n",
"taxon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This can be trivially parallelised for fast sequence characterisation, bearing in mind that this is an undocumented API which I'm informed runs on a single node… Be gentle."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment