Last active
December 27, 2015 00:39
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This script can be run using any of the following: | |
# | |
# ruby RegexMatchingExampleExplained.rb < file1.xml | |
# cat file1.xml file2.xml file3.xml | ruby RegexMatchingExampleExplained.rb | |
# ruby RegexMatchingExampleExplained.rb file1.xml file2.xml file3.xml | |
# | |
# Read everything from standard input or from files specified as command line arguments | |
content = ARGF.read | |
# Find all <article-title></article-title> pairs using a non-greedy match | |
matches = content.scan(/<article-title>(.*?)<\/article-title>/m) | |
# String#scan will return the results in nested arrays. Flatten them into one array | |
matches.flatten! | |
# Replace any runs of multiple spaces in the titles with a single space | |
matches.map! { |title| title.gsub(/\s+/, ' ') } | |
# Print out the matches | |
puts matches | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment