Skip to content

Instantly share code, notes, and snippets.

@blahah
Last active August 29, 2015 14:16
Show Gist options
  • Save blahah/061c8a2ebcefe7aea500 to your computer and use it in GitHub Desktop.
Save blahah/061c8a2ebcefe7aea500 to your computer and use it in GitHub Desktop.
Genome assembly game - basic version
#! /usr/bin/env ruby
# This is a game that helps people understand genome assembly. Given a string,
# it generates sequence reads giving perfect coverage of the string and with a
# fixed overlap. The idea is to print the generated reads, cut them out, and
# have learners assemble them by hand. Different difficulties can be
# demonstrated by using a string with repeats, low complexity, etc., to mimic
# real assembly problems, or by adjusting the parameters (overlap and number
# of fragments).
# generate n approximately equally-sized fragments with each contiguous pair
# of fragments overlapping by k such that the entire quote is covered by the
# resulting fragments
def sequence_quote_with_overlap(quote, n_fragments, k)
quote = quote.downcase
# the quote length may not be divisible by the number of fragments,
# so we distribute the remainder across the fragments randomly.
remainder = quote.length % n_fragments
bump = ([1] * remainder + [0] * (n_fragments - remainder)).shuffle
fraglen = quote.length / n_fragments + k
# because the final fragment will be too short by the overlap size,
# we recover k characters from the other fragments at random
(0...bump.length).to_a.sample(k).each{ |i| bump[i] -= 1 }
# for each fragment, adjust the fragment length by the bump
# and sequence the fragment from the quote
fragments = []
firstchar = 0
adj_fraglen = fraglen + bump[0]
lastchar = firstchar + adj_fraglen - 1
(1...n_fragments).each do |i|
adj_fraglen = fraglen + bump[i]
fragments << quote[firstchar..lastchar]
firstchar = lastchar - k + 1
lastchar = firstchar + adj_fraglen - 1
end
# store the final fragment
fragments << quote[firstchar..lastchar]
fragments
end
# demos
# simple assembly
quote = "Try a thing you haven’t done three times. Once, to get over the fear of doing it. Twice, to learn how to do it. And a third time, to figure out whether you like it or not."
n_fragments = 18
k = 6
frags = sequence_quote_with_overlap quote, n_fragments, k
frags.shuffle.each { |f| puts "\"#{f}\"" }
# with repeats
quote = "Happiness resides not in possessions, and not in gold, happiness dwells in the soul."
k = 3
frags = sequence_quote_with_overlap quote, n_fragments, k
frags.shuffle.each { |f| puts "\"#{f}\"" }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment