Skip to content

Instantly share code, notes, and snippets.

@nylander
Created November 21, 2017 08:12
Show Gist options
  • Save nylander/9160977f83e3be75ce7b1b9a5d3befa5 to your computer and use it in GitHub Desktop.
Save nylander/9160977f83e3be75ce7b1b9a5d3befa5 to your computer and use it in GitHub Desktop.
Get taxonomy from genbank accessions
#!/bin/bash
## Fetch scientific name for Genbank accession number
## Version: Mon 20 nov 2017 13:35:49
## By: {Andreas.Kahari,Johan.Nylander}@nbis.se
## Usage: ./acc2sci.sh infile-with-accessions
## Notes: Reads from file or STDIN, prints to STDOUT.
## No detailed error checking! If any of the Accessions
## provided as queries are incorrect, the script will
## exit with msg: "Failure of post to find data to load.."
## Requirements: Entrez Direct suite of software
## (https://www.ncbi.nlm.nih.gov/books/NBK179288/)
set -euo pipefail
#author_email_address='[email protected]'
#name_of_script='acc2tax'
#econtact -email ${author_email_address} -tool ${name_of_script}
### Andreas' solution
#rm -f __taxid-acc.out __taxonomy.out
xargs -n 200 <${1:-/dev/stdin} | tr ' ' ',' |\
xargs -n 1 sh -c '
esearch -db nuccore -query "$0" |\
efetch -format docsum |\
xtract -pattern DocumentSummary -element TaxId,Caption |\
tee -a __taxid-acc.out | cut -f1 | sort -u |\
epost -db taxonomy |\
efetch -format docsum |\
xtract -pattern DocumentSummary -element TaxId,ScientificName,Division' \
>__taxonomy.out
join -t$'\t' \
-o 1.2,2.1,2.2,2.3 \
<( sort -u __taxid-acc.out ) <( sort -u __taxonomy.out )
### End of Andreas' solution
rm -f __taxid-acc.out __taxonomy.out
## Slow, check one by one
#while read line
#do
# sleep 1
# echo -n "$line "
# esearch -db nuccore -query "${line}" < /dev/null |\
# efetch -format docsum |\
# xtract -pattern DocumentSummary -element TaxId |\
# sort -n |\
# uniq |\
# epost -db taxonomy |\
# efetch -format docsum |\
# xtract -pattern DocumentSummary -element ScientificName,Division,TaxId
#done < "${1:-/dev/stdin}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment