Skip to content

Instantly share code, notes, and snippets.

@michaelbarton
Created March 19, 2010 20:17
Show Gist options
  • Save michaelbarton/338126 to your computer and use it in GitHub Desktop.
Save michaelbarton/338126 to your computer and use it in GitHub Desktop.
Parse prodigal gene prediction output
require 'rubygems'
require 'bio'
require 'yaml'
locations = Hash.new{|k,v| k[v] = []}
Bio::FlatFile.open(Bio::GenBank, ARGV[0]).each do |sequence|
sequence.each_cds do |cds|
cds.locations.each do |location|
locations[sequence.definition] << {
'start' => location.from,
'end' => location.to,
'complement' => location.strand < 0}
end
end
end
puts YAML.dump(locations)
require 'rubygems'
require 'bio'
require 'yaml'
coordinates = YAML.load(File.read(ARGV[0]))
entries = Bio::FastaFormat.open(ARGV[1])
entries.each do |entry|
# Had to normalise white space in the name
name = entry.definition.gsub(/\s+/,' ')
coordinates[name].each_with_index do |coords,i|
seq = entry.seq.subseq(coords['start'],coords['end'])
seq = seq).complement if coords['complement']
puts seq.to_fasta("#{name} ORF:#{i + 1}")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment