Skip to content

Instantly share code, notes, and snippets.

@ursm
Forked from tfuji/parse_st.rb
Last active March 14, 2016 12:37
Show Gist options
  • Save ursm/58f3228545b94dd51318 to your computer and use it in GitHub Desktop.
Save ursm/58f3228545b94dd51318 to your computer and use it in GitHub Desktop.
INSDC structured comment parser
#!/usr/bin/env ruby
require 'rubygems'
require 'bio'
def parse_st_comment(comment)
comment.scan(/##(.+)-START##\n(.*)\n##\1-END##/m).each_with_object({}) {|(tagset, block), memo|
i = 0
memo[tagset] = block.lines.chunk {|line|
line.include?('::') ? i += 1 : i
}.map {|_, lines|
lines.map(&:strip).join(' ').split(/\s*::\s*/)
}.to_h
}
end
io = ARGF
Bio::FlatFile.auto(io).each do |entry|
#puts entry.comment
st = parse_st_comment(entry.comment)
st.each do |k, v|
v.each do |vk, vv|
puts ["#{entry.entry_id}.#{entry.version}", k, vk, vv].join("\t")
end
end
#@entry = entry
#@features = entry.features
#@source = @features.shift
#parse_sequence
#parse_source
#parse_genes
#parse_features
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment