Last active
January 3, 2016 09:59
-
-
Save boxmein/8446393 to your computer and use it in GitHub Desktop.
A Ruby class to generate order-2 Markov text.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env ruby | |
| # | |
| # Markov.rb | |
| # ========= | |
| # | |
| # A weighted Markov chain [text] generator implementation in Ruby | |
| # Actually works! | |
| # | |
| # | |
| # Usage | |
| # ===== | |
| # | |
| # Markov.new does absolutely nothing of interest. | |
| # | |
| # Markov#add (from, to) adds an association between a word and its likely successor, | |
| # returning nothing of interest. | |
| # | |
| # Markov#get (seed, n) returns a string with n length and a starting word of seed. | |
| # Both arguments may be omitted and will be randomly generated as | |
| # necessary. By default, n is anywhere between 2 and 22. | |
| # By default, seed is one of the words that have been used for | |
| # associations. | |
| # | |
| # weighted_rand (limit) returns a value from 0 to limit that's more likely to be near | |
| # 0 than the limit. I use this in my weighting as the word lists | |
| # use order to determine most commonly used words. | |
| # | |
| # Well, that's about it. | |
| # | |
| # by boxmein 2014 - free to use with attribution. | |
| # | |
| # something random-like | |
| # however quite determinate | |
| # more-so as weighted | |
| # | |
| def weighted_rand (limit=1) | |
| r = rand | |
| ((r ** 2) * limit).to_i | |
| end | |
| class Markov | |
| # | |
| # Key value pairs used | |
| # Providing state and weight as | |
| # Markov chain order | |
| # | |
| attr_accessor :data | |
| def initialize | |
| @data = Hash.new | |
| end | |
| # | |
| # The machine learns now. | |
| # First key saves second value | |
| # Let it do its thing. | |
| # | |
| def add (fst, snd) | |
| # good move? | |
| fst=fst.to_sym | |
| if @data[fst] | |
| if @data[fst].index(snd) | |
| # puts "sent existing #{snd} to front of array" | |
| @data[fst].delete snd | |
| @data[fst].unshift snd | |
| else | |
| # puts "added #{snd} to end of array" | |
| @data[fst].push snd | |
| end | |
| else | |
| # puts "new #{fst}: added new array with #{snd}" | |
| @data[fst] = [snd] | |
| end | |
| # for scale | |
| # p @data | |
| end | |
| # | |
| # there comes a time when | |
| # even our mightiest database | |
| # must out a value | |
| # | |
| def get (seed=nil) | |
| sentence = "" | |
| # woo, default symbol! | |
| seed ||= @data.keys.sample | |
| # we did symbol stuff! | |
| seed = seed.to_sym | |
| (Math.log(rand * 10) * 20).to_i.times do |i| | |
| seed = one_word seed | |
| sentence += " #{seed}" | |
| end | |
| return sentence.strip.squeeze(' ').capitalize + '.' | |
| end | |
| # | |
| # for multiple words | |
| # a single instance method for | |
| # proper behaviour | |
| # | |
| def one_word (seed) | |
| # let's just turn it into a symbol even if it is one | |
| seed = seed.to_sym | |
| @data[seed][weighted_rand @data[seed].length] | |
| end | |
| # given a file name, create a Markov structure for it | |
| def self.fromfile (f) | |
| raise "#{f} does not exist" if not File.exist? f | |
| raise "#{f} is not a file" if not File.file? f | |
| raise "#{f} is unavailable to this script" if not File.readable? f | |
| m = Markov.new | |
| File.open(f, 'r').each_line do |line| | |
| line = line.chomp.split | |
| line.length.times do |i| | |
| # print i, ': ', line[i], ' -> ', line[i+1], "\n" | |
| m.add(line[i], line[i+1]) if line.length > i + 1 | |
| end | |
| end | |
| return m | |
| end | |
| end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment