Skip to content

Instantly share code, notes, and snippets.

@jsuwo
Last active December 27, 2015 00:39
Show Gist options
  • Select an option

  • Save jsuwo/7239782 to your computer and use it in GitHub Desktop.

Select an option

Save jsuwo/7239782 to your computer and use it in GitHub Desktop.
# This script can be run using any of the following:
#
# ruby XmlParsingExampleExplained < file1.xml
# cat file1.xml | ruby XmlParsingExampleExplained
# ruby XmlParsingExampleExplained file1.xml
#
# Note that multiple XML files should NOT be specified at the same time, as was possible in the regex matching example.
# Require the Nokogiri library -- this library must be installed with 'gem install nokogiri'
require 'nokogiri'
# Read everything from standard input or from files specified as command line arguments
content = ARGF.read
# Parse the content as an XML document
doc = Nokogiri::XML(content)
# Run an XPath query to find all <article-title> tags
matches = doc.xpath('//article-title')
# Extract the content from the tags and replace any runs of multiple spaces in the titles with a single space
matches = matches.map { |t| t.content.gsub(/\s+/, ' ') }
# Print out the matches
puts matches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment