Skip to content

Instantly share code, notes, and snippets.

@audy
Created July 21, 2010 20:53
Show Gist options
  • Select an option

  • Save audy/485116 to your computer and use it in GitHub Desktop.

Select an option

Save audy/485116 to your computer and use it in GitHub Desktop.
from irs import *
from urllib import urlopen
gids = ['AB019734.1', 'X86636.1', 'X86650.1', 'AB009938.1']
def rettaxids(gids):
''' Retrieves TaxIDs given a list of GeneIDs '''
# Always tell NCBI who they're messing with
email = 'harekrishna@gmail.com'
addr = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?\
email=%s&db=nucleotide&rettype=gb&retmode=text&' % email
# Get the junk
h = urlopen('%sid=%s' % (addr, ','.join(gids)))
records = h.read()
acc_to_tax = {}
# Dumb and fast!
for record in records.split('//\n')[:-1]:
accession = record.split('\nVERSION')[1].split()[0]
taxid = record.split('taxon:')[1].split('\"')[0]
acc_to_tax[accession] = taxid
return acc_to_tax
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
for chunk in chunks(gids, 350):
print rettaxids(chunk)
@audy

audy commented Jul 22, 2010

Copy link
Copy Markdown
Author

If there's an error - the bad query won't be in the returned dictionary. So you can just catch a KeyError or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment