Skip to content

Instantly share code, notes, and snippets.

@matpalm
Created April 16, 2011 20:01
Show Gist options
  • Save matpalm/923435 to your computer and use it in GitHub Desktop.
Save matpalm/923435 to your computer and use it in GitHub Desktop.
one_article_per_line.rb
#!/usr/bin/env ruby
article = ''
STDIN.each do |line|
begin
line.chomp!
if line == '---END.OF.DOCUMENT---'
puts "0\t#{article}"
article = ''
warn "reporter:counter:article_per_line,num_articles,1"
else
article += line.
gsub(%r{<URL>},' ').
gsub(%r{<EMAILADDRESS>},' ').
gsub(%r{<NEWSURL>},' ').
gsub(%r{[<>]},' ') + ' '
end
rescue
warn "reporter:counter:article_per_line,parse_errors,1"
end
end
puts "0\t#{article}"
warn "reporter:counter:article_per_line,num_articles,1"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment