Skip to content

Instantly share code, notes, and snippets.

@brycemcd
Created January 8, 2014 05:04
Show Gist options
  • Save brycemcd/8312053 to your computer and use it in GitHub Desktop.
Save brycemcd/8312053 to your computer and use it in GitHub Desktop.
quick Nokogiri script to pull headlines out of news articles
require 'rubygems'
require 'nokogiri'
require 'open-uri'
# Takes a URL for a news site (like Huffpo or Yahoo) as input and outputs the Headline
url = ARGV[0]
resource = NokoGiri::XML(open(url))
# any news site should have only one h1 and the h1 should be their headline,
# but who knows?
resource.search("h1").each do |h1|
puts h1.text # will discard all elements that may be in an H1 and only output text
end
# NOTE: This is completely from memory and untested, there may be bugs
# USAGE:
# From a console where headlines.rb is in the directory:
# ruby headlines.rb http://news.yahoo.com/record-freeze-extends-eastern-united-states-least-nine-004335490--sector.html
# ruby headlines.rb http://www.huffingtonpost.com/azeem-khan/heres-why-the-nyc-bitcoin_b_4551792.html?utm_hp_ref=technology&ir=Technology
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment