Created
June 2, 2015 12:52
-
-
Save peterjc/bf6796adbc9e7737bff6 to your computer and use it in GitHub Desktop.
Quick hack for a Biopython conversion question http://mailman.open-bio.org/pipermail/biopython/2015-June/015648.html
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from Bio import SeqIO | |
with open("CP008802.txt", "w") as output: | |
output.write("Seqname\tSource\tfeature\tStart\tEnd\tScore\tStrand\tFrame\tAttributes\n") | |
for record in SeqIO.parse("CP008802.gbk", "genbank"): | |
print("Converting %s" % record.name) | |
for f in record.features: | |
if f.type != "gene": | |
continue | |
locus_tag = f.qualifiers["locus_tag"][0] | |
if len(f.location.parts) > 1: | |
print("What should we do for %s (compound location)? %s" % (locus_tag, f.location)) | |
continue | |
output.write('%s\tGenBank\t%s\t%i\t%i\t0,000000\t%s\t.\tlocus_tag\t"%s"; transcript_id "%s"\n' | |
% (record.name, f.type, | |
f.location.start + 1, f.location.end, f.location.strand, | |
locus_tag, locus_tag)) | |
print("Done") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Assuming you've saved http://www.ncbi.nlm.nih.gov/nuccore/CP008802 as a plain text GenBank format file in the current directory as
CP008802.gbk
, this will writeCP008802.txt
and print the following on screen: