Last active
December 27, 2015 00:39
-
-
Save jsuwo/7239782 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # This script can be run using any of the following: | |
| # | |
| # ruby XmlParsingExampleExplained < file1.xml | |
| # cat file1.xml | ruby XmlParsingExampleExplained | |
| # ruby XmlParsingExampleExplained file1.xml | |
| # | |
| # Note that multiple XML files should NOT be specified at the same time, as was possible in the regex matching example. | |
| # Require the Nokogiri library -- this library must be installed with 'gem install nokogiri' | |
| require 'nokogiri' | |
| # Read everything from standard input or from files specified as command line arguments | |
| content = ARGF.read | |
| # Parse the content as an XML document | |
| doc = Nokogiri::XML(content) | |
| # Run an XPath query to find all <article-title> tags | |
| matches = doc.xpath('//article-title') | |
| # Extract the content from the tags and replace any runs of multiple spaces in the titles with a single space | |
| matches = matches.map { |t| t.content.gsub(/\s+/, ' ') } | |
| # Print out the matches | |
| puts matches |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment