Skip to content

Instantly share code, notes, and snippets.

@wtnabe
Created October 26, 2009 09:32
Show Gist options
  • Save wtnabe/218535 to your computer and use it in GitHub Desktop.
Save wtnabe/218535 to your computer and use it in GitHub Desktop.
RTM `printplanner' scraper
#! /usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
#
# RTM `printplanner' scraper
#
#TARGET = 'http://www.rememberthemilk.com/printplanner/USER/'
TARGET = 'rtm-weekly.html'
#
# Return list of list
#
# [{h1.inner_text => [ li, li, li, li, ...]},
# {h1.inner_text => [ li, li, li, li, ...]},
# ...
# ]
#
# each li
#
# li == { 'name' => li's text node,
# 'list' => list,
# 'limit' => due
# }
#
Nokogiri( open( TARGET ).read
).search( '//h1[following-sibling::ul]' ).map { |h1|
li = h1.search( './following-sibling::ul[1]/li' )
if ( li.size > 0 )
{
h1.inner_text =>
li.map { |e|
list, limit = e.search( '.tasklist'
).inner_text.sub( /\A\(/, ''
).sub( /\)\z/, ''
).split( /,/ )
{
'name' => e.children.map { |n|
if ( n.node_name == 'text' )
n
else
nil
end
}.join.strip,
'list' => list.strip,
'limit' => limit.strip
}
}
}
else
nil
end
}.compact
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment