Skip to content

Instantly share code, notes, and snippets.

@vargheseraphy
Created January 30, 2015 07:14
Show Gist options
  • Save vargheseraphy/75f4620c07b788ad347f to your computer and use it in GitHub Desktop.
Save vargheseraphy/75f4620c07b788ad347f to your computer and use it in GitHub Desktop.
Run parse.rb to see how the HTML files are parsed.
require 'nokogiri'
Dir.foreach('/Users/geordee/Projects/Data') do |filename|
next if filename == '.' or filename == '..'
puts "Processing #{filename}"
fin = File.open(filename, "r")
html = Nokogiri::HTML(fin)
textarea = html.search("//textarea")
fout = File.new("#{filename}.txt", "w")
fout.write(textarea.first.children.to_s) if textarea.first
fin.close
fout.close
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment