Created
January 4, 2012 21:03
-
-
Save plindberg/1562129 to your computer and use it in GitHub Desktop.
Archive your Jaiku on jaikuarchive.com!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
require 'mechanize' | |
agent = Mechanize.new | |
page = agent.get("http://plindberg.jaiku.com/") | |
while Mechanize::Page === page | |
puts "checking #{page.uri}" | |
delay = 0 | |
page.links_with(href: /\.jaiku\.com\/presence\/(?!last)/).each do |link| | |
uri = link.uri.tap do |u| | |
u.host = u.host.gsub(/(?<=\.jaiku)(?=\.com$)/, 'archive') | |
u.fragment = nil | |
end.to_s | |
unless agent.visited?(uri) | |
puts uri | |
begin | |
agent.get(uri) | |
rescue Mechanize::ResponseCodeError => e | |
$stderr.puts "Error #{e.response_code} on fetch #{uri}" | |
end | |
sleep 0.5 | |
delay += 0.5 | |
end | |
end | |
sleep 10 - delay unless delay > 10 # is what jaiku.com/robots.txt says | |
page = page.link_with(text: /Older/, href: /offset/).click | |
end |
För mig dör den på http://lemonad.jaikuarchive.com/presence/11012465
Misstänker att det är den världsberömda ur-nockerten.
Jag ändrade rad 16 till:
begin
agent.get(uri)
rescue Mechanize::ResponseCodeError
$stderr.print "Error on fetch #{uri}"
end
och bytte ut rad 22 till:
page = page.link_with(text: /Older/, href: /offset/).click
Tack! Uppdaterade gisten.
Where's teh like button :)
Where's teh flattr button?
(y) Fungerade utmärkt på linux. (Arch linux) Vill också ha en flattr button ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No error checking whatsoever. Should you get an exception, note the most recent Jaiku URL with an ?offset= parameter. Then change line 5 to start from there again.