Skip to content

Instantly share code, notes, and snippets.

@jsuwo
Last active December 27, 2015 00:39
# This script can be run using any of the following:
#
# ruby RegexMatchingExampleExplained.rb < file1.xml
# cat file1.xml file2.xml file3.xml | ruby RegexMatchingExampleExplained.rb
# ruby RegexMatchingExampleExplained.rb file1.xml file2.xml file3.xml
#
# Read everything from standard input or from files specified as command line arguments
content = ARGF.read
# Find all <article-title></article-title> pairs using a non-greedy match
matches = content.scan(/<article-title>(.*?)<\/article-title>/m)
# String#scan will return the results in nested arrays. Flatten them into one array
matches.flatten!
# Replace any runs of multiple spaces in the titles with a single space
matches.map! { |title| title.gsub(/\s+/, ' ') }
# Print out the matches
puts matches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment