Skip to content

Instantly share code, notes, and snippets.

@cabo
Created November 29, 2011 11:10
Show Gist options
  • Save cabo/1404435 to your computer and use it in GitHub Desktop.
Save cabo/1404435 to your computer and use it in GitHub Desktop.
Clean up an awful web page
#!/opt/local/bin/ruby1.9
require 'rubygems'
require 'nokogiri'
require 'open-uri'
txt = open("http://www.studentenwerk.bremen.de/files/main_info/essen/plaene/uniessen.php").read
txt.gsub!(/<</, "&laquo;")
txt.gsub!(/>>>/, ">&raquo;")
txt.gsub!(/>>/, "&raquo;")
doc = Nokogiri::HTML(txt)
table = (doc/"//table")
table.search('//img').each { |e|
s = e.attributes['src'].value
if md = s.match(/(\w+)\.gif$/)
e.swap(if md[1] == 'spacer'; '' else "[#{md[1][0..0].upcase}]" end)
end
}
table.search('//font[@size=2]').each { |e|
e.swap(if (a = e.attributes['color']) && a.value == "#000000"
e.inner_html
else %Q{<span style="color: green">#{e.inner_html}</span>}
end)
}
File.open("#{ENV['HOME']}/tmp/mensa.html", "w") { |f|
f.write table.to_html
}
`open ~/tmp/mensa.html`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment