Last active
June 4, 2022 05:02
-
-
Save chtzvt/a9cdcd617b1016b2351f to your computer and use it in GitHub Desktop.
Simple script to save your entire Pocket list (unread + archived) in PDF form.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin | |
Pocket Export.rb | |
My first 'real' Ruby script (hello world)! | |
More info here: http://blog.ctis.me/2015/12/archiving-your-pocket-articles-with-ruby.html | |
DEPENDENCIES: | |
pocket_export requires the following gems: | |
curb | |
nokogiri | |
pocket_export requires the following packages (install them with your system's package manager): | |
wkhtmltopdf - wkhtmltopdf.org | |
USAGE: | |
**make sure that dependencies are installed first** | |
Go to https://getpocket.com/export/, and download the HTML file with your pocket data. Then, run this script with | |
the full path to the HTML file supplied as an argument (e.g. ~/Downloads/ril_export.html). The script will begin downloading | |
items immediately, and will save download files in ./pocket_export_data. Errors, if encountered, are logged in pocket_export_errors.log | |
NOTE: | |
This process can potentially be fairly CPU-intensive, as all pages are downloaded and rendered as PDFs. If you have many items in your list, the process is | |
going to take a while. | |
=end | |
require 'curb' | |
require 'open-uri' | |
require 'nokogiri' | |
if ARGV.length < 1 | |
abort("pocket_export.rb /path/to/ril_export.html") | |
else | |
pocket_data = ARGV[0] | |
Dir.mkdir("./pocket_export_data/") unless File.exists?("./pocket_export_data/") | |
end | |
Nokogiri::HTML(open(pocket_data)).css('a').each { |link| | |
begin | |
# Set link to value of href attribute of <a> tag. | |
link = link['href'] | |
# Follow any redirects until final destination is found (url shorteners etc). | |
curl = Curl::Easy.perform(link.gsub("\n",'')) do |curl| | |
curl.head = true | |
curl.follow_location = true | |
end | |
# Fetch the webpage title for use in the filename. | |
title = Nokogiri::HTML(open(curl.last_effective_url)).at('title').text.gsub("'", "").gsub('"','') | |
puts "\n\n\n***Downloading #{title} (#{link})..." | |
# Run wkhtmltopdf | |
system("wkhtmltopdf '#{link}' ./pocket_export_data/'#{title}.pdf'") | |
rescue | |
# Catch and log any exceptions. | |
puts "\n\n\n!!!Downloading #{link} FAILED!!\n\n\n" | |
File.open('./pocket_export_data/pocket_export_errors.log', 'a') { |errorlog| | |
errorlog.write("Error: " << a << "\n") | |
} | |
end | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment