Skip to content

Instantly share code, notes, and snippets.

@kirel
Created July 29, 2008 10:44
Show Gist options
  • Save kirel/3060 to your computer and use it in GitHub Desktop.
Save kirel/3060 to your computer and use it in GitHub Desktop.
Scraping ugly tables...
#!/usr/bin/ruby
require 'rubygems'
gem 'RubyInline', '=3.6.3'
require 'scrubyt'
bus_data = Scrubyt::Extractor.define do
fetch 'http://rvm-online.de'
route = { :place_origin => "everswinkel",
:name_origin => "mitte",
:place_destination => "münster (westf)",
:name_destination => "hbf", }
now = Time.now
time = { :itdTimeHour => now.hour,
:itdTimeMinute => now.min,
:itdDateDay => now.day,
:itdDateMonth => now.mon,
:itdDateYear => now.year, }
route.each { |field,value| fill_textfield field, value }
time.each { |field,value| fill_textfield field, value }
submit
# scrape result
fahrten "//table//table[2]//tr[15]//table" do
fahrt "//" do
ab "//tr[3]//td[2]//span"
an "//tr[4]//td[2]//span"
bus "//tr[3]//td[8]//span"
end
fahrt "//" do
ab "//tr[11]//td[2]//span"
an "//tr[12]//td[2]//span"
bus "//tr[11]//td[8]//span"
end
fahrt "//" do
ab "//tr[19]//td[2]//span"
an "//tr[20]//td[2]//span"
bus "//tr[19]//td[8]//span"
end
end
end
puts bus_data.to_xml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment