Skip to content

Instantly share code, notes, and snippets.

@plindberg
Created January 4, 2012 21:03
Show Gist options
  • Save plindberg/1562129 to your computer and use it in GitHub Desktop.
Save plindberg/1562129 to your computer and use it in GitHub Desktop.
Archive your Jaiku on jaikuarchive.com!
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://plindberg.jaiku.com/")
while Mechanize::Page === page
puts "checking #{page.uri}"
delay = 0
page.links_with(href: /\.jaiku\.com\/presence\/(?!last)/).each do |link|
uri = link.uri.tap do |u|
u.host = u.host.gsub(/(?<=\.jaiku)(?=\.com$)/, 'archive')
u.fragment = nil
end.to_s
unless agent.visited?(uri)
puts uri
begin
agent.get(uri)
rescue Mechanize::ResponseCodeError => e
$stderr.puts "Error #{e.response_code} on fetch #{uri}"
end
sleep 0.5
delay += 0.5
end
end
sleep 10 - delay unless delay > 10 # is what jaiku.com/robots.txt says
page = page.link_with(text: /Older/, href: /offset/).click
end
@plindberg
Copy link
Author

No error checking whatsoever. Should you get an exception, note the most recent Jaiku URL with an ?offset= parameter. Then change line 5 to start from there again.

@moonhouse
Copy link

För mig dör den på http://lemonad.jaikuarchive.com/presence/11012465

Misstänker att det är den världsberömda ur-nockerten.

@moonhouse
Copy link

Jag ändrade rad 16 till:

begin
  agent.get(uri)
rescue Mechanize::ResponseCodeError
  $stderr.print "Error on fetch #{uri}"
end

@moonhouse
Copy link

och bytte ut rad 22 till:

page = page.link_with(text: /Older/, href: /offset/).click

@plindberg
Copy link
Author

Tack! Uppdaterade gisten.

@arnklint
Copy link

arnklint commented Jan 6, 2012

Where's teh like button :)

@jardenberg
Copy link

Where's teh flattr button?

@madr
Copy link

madr commented Jan 6, 2012

(y) Fungerade utmärkt på linux. (Arch linux) Vill också ha en flattr button ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment