Skip to content

Instantly share code, notes, and snippets.

@michaelbarton
Created March 2, 2010 21:54
Show Gist options
  • Save michaelbarton/320001 to your computer and use it in GitHub Desktop.
Save michaelbarton/320001 to your computer and use it in GitHub Desktop.
Find gap regions containing Ns in fasta files
#!/usr/bin/env ruby
require 'rubygems'
require 'bio'
def find_gaps(file,buffer=200)
re = Regexp.compile("[ATGC]{#{buffer}}N+[ATGC]{#{buffer}}")
return Bio::FastaFormat.open(file).inject(Hash.new([])) do |hash,entry|
hash[entry.definition.split.first] = entry.seq.scan(re)
hash
end
end
# This bit prints the sequences to STDOUT as a series of fasta entries
find_gaps(ARGV[0]).each do |scaffold,sequences|
sequences.each_with_index {|s,i| puts s.to_fasta("#{scaffold} gap#{i+1}")}
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment