Skip to content

Instantly share code, notes, and snippets.

@michaelbarton
Created October 13, 2010 18:20
Show Gist options
  • Save michaelbarton/624569 to your computer and use it in GitHub Desktop.
Save michaelbarton/624569 to your computer and use it in GitHub Desktop.
Hard coded script to produce genbank file for a genome from IMG output
# Produce some kind of genbank file output
#!/usr/bin/env ruby
require 'bio'
require 'fastercsv'
proteins = Bio::FlatFile.auto('annotation/proteins.faa').inject({}) do |h,p|
h[p.definition.split.first] = p.seq
h
end
fasta = Bio::FlatFile.auto('assembly/build.fna')
record = Bio::Sequence.new(fasta.first.seq)
record.definition = "Pseudomonas fluorescens R124, complete genome"
record.species = "Pseudomonas fluorescens R124"
record.features = []
FasterCSV.open('annotation/gene_list.csv','r',:headers => true).each do |e|
coordinates = "#{e['Start Coord']}..#{e['End Coord']}"
if e['Strand'] == '-'
coordinates = "complement(#{coordinates})"
end
id = e['gene_oid']
qualifiers = []
qualifiers << Bio::Feature::Qualifier.new('gene', id)
record.features << Bio::Feature.new('gene',coordinates,qualifiers.clone)
if e['Description']
qualifiers << Bio::Feature::Qualifier.new('function', e['Description'])
end
qualifiers << Bio::Feature::Qualifier.new('translation', proteins[id])
record.features << Bio::Feature.new('CDS',coordinates,qualifiers)
end
File.open('R124.gb','w') do |out|
out.print record.output(:genbank)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment