Skip to content

Instantly share code, notes, and snippets.

@vargheseraphy
Created March 5, 2015 11:48
Show Gist options
  • Save vargheseraphy/d2c89b7cce8132ed10ed to your computer and use it in GitHub Desktop.
Save vargheseraphy/d2c89b7cce8132ed10ed to your computer and use it in GitHub Desktop.
Ruby web scraping using mechanize and nokogiri gems
# http://www.icicletech.com/blog/web-scraping-with-ruby-using-mechanize-and-nokogiri-gems
require "mechanize"
url = ARGV[0]
fp = File.new("wikilinks.txt", "w")
agent = Mechanize.new { |agent| agent.user_agent_alias = "Mac Safari" }
html = agent.get(url).body
html_doc = Nokogiri::HTML(html)
fp.write("References\n\n")
list = html_doc.xpath("//ol[@class='references']")
list.each { |i| fp.write(i.text + "\n") }
fp.write("Further Reading\n\n")
list = html_doc.xpath("//span[@class='citation']")
list.each { |i| fp.write(i.text + "\n") }
# code usage
#$ ruby scraper.rb "http://en.wikipedia.org/wiki/Ruby_(programming_language)"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment