Skip to content

Instantly share code, notes, and snippets.

@boxmein
Last active January 3, 2016 09:59
Show Gist options
  • Save boxmein/8446393 to your computer and use it in GitHub Desktop.
Save boxmein/8446393 to your computer and use it in GitHub Desktop.
A Ruby class to generate order-2 Markov text.
#!/usr/bin/env ruby
#
# Markov.rb
# =========
#
# A weighted Markov chain [text] generator implementation in Ruby
# Actually works!
#
#
# Usage
# =====
#
# Markov.new does absolutely nothing of interest.
#
# Markov#add (from, to) adds an association between a word and its likely successor,
# returning nothing of interest.
#
# Markov#get (seed, n) returns a string with n length and a starting word of seed.
# Both arguments may be omitted and will be randomly generated as
# necessary. By default, n is anywhere between 2 and 22.
# By default, seed is one of the words that have been used for
# associations.
#
# weighted_rand (limit) returns a value from 0 to limit that's more likely to be near
# 0 than the limit. I use this in my weighting as the word lists
# use order to determine most commonly used words.
#
# Well, that's about it.
#
# by boxmein 2014 - free to use with attribution.
#
# something random-like
# however quite determinate
# more-so as weighted
#
def weighted_rand (limit=1)
r = rand
((r ** 2) * limit).to_i
end
class Markov
#
# Key value pairs used
# Providing state and weight as
# Markov chain order
#
attr_accessor :data
def initialize
@data = Hash.new
end
#
# The machine learns now.
# First key saves second value
# Let it do its thing.
#
def add (fst, snd)
# good move?
fst=fst.to_sym
if @data[fst]
if @data[fst].index(snd)
# puts "sent existing #{snd} to front of array"
@data[fst].delete snd
@data[fst].unshift snd
else
# puts "added #{snd} to end of array"
@data[fst].push snd
end
else
# puts "new #{fst}: added new array with #{snd}"
@data[fst] = [snd]
end
# for scale
# p @data
end
#
# there comes a time when
# even our mightiest database
# must out a value
#
def get (seed=nil)
sentence = ""
# woo, default symbol!
seed ||= @data.keys.sample
# we did symbol stuff!
seed = seed.to_sym
(Math.log(rand * 10) * 20).to_i.times do |i|
seed = one_word seed
sentence += " #{seed}"
end
return sentence.strip.squeeze(' ').capitalize + '.'
end
#
# for multiple words
# a single instance method for
# proper behaviour
#
def one_word (seed)
# let's just turn it into a symbol even if it is one
seed = seed.to_sym
@data[seed][weighted_rand @data[seed].length]
end
# given a file name, create a Markov structure for it
def self.fromfile (f)
raise "#{f} does not exist" if not File.exist? f
raise "#{f} is not a file" if not File.file? f
raise "#{f} is unavailable to this script" if not File.readable? f
m = Markov.new
File.open(f, 'r').each_line do |line|
line = line.chomp.split
line.length.times do |i|
# print i, ': ', line[i], ' -> ', line[i+1], "\n"
m.add(line[i], line[i+1]) if line.length > i + 1
end
end
return m
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment