Skip to content

Instantly share code, notes, and snippets.

@capotej
Created December 18, 2013 02:19
Show Gist options
  • Select an option

  • Save capotej/8016318 to your computer and use it in GitHub Desktop.

Select an option

Save capotej/8016318 to your computer and use it in GitHub Desktop.
presidential speech markov model
require 'open-uri'
require 'nokogiri'
require 'marky_markov'
markov = MarkyMarkov::Dictionary.new('dictionary')
page = Nokogiri::HTML(open("http://millercenter.org/president/speeches"))
speech_urls = []
page.css('a').each do |link|
href = link.attributes["href"].to_s
if href.include?("speeches/detail") and !speech_urls.include?(href)
speech_urls << href
end
end
speech_urls.each do |href|
url = "http://millercenter.org/president/#{href}"
puts "opening #{url}"
speech_page = Nokogiri::HTML(open(url))
markov.parse_string speech_page.css('#transcript').text
markov.save_dictionary!
end
puts markov.generate_20_words
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment