Skip to content

Instantly share code, notes, and snippets.

@freakhill
Last active December 19, 2015 02:19
Show Gist options
  • Save freakhill/5882543 to your computer and use it in GitHub Desktop.
Save freakhill/5882543 to your computer and use it in GitHub Desktop.
rename pdf files i downloader around the internet
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
# easy to break use at your own risk
# this script renames pdf files in a folder (and subfoler) (on unixy machines)
# with a naive heuristic to find the title (that unfortunately is never in the *title* metadata)
require 'pdf-reader'
def is_title? str
str =~ /^[\w\s:,\(\)-\.\|\?#]+$/
end
def find_title file
ln = "\n".codepoints.first
# look for "\n\n" pattern
reader = PDF::Reader.new file
last_codepoint = nil
header = reader.page(1).text.each_codepoint.slice_before do |cp|
lc = last_codepoint
last_codepoint = cp
[cp, lc] == [ln, ln]
end.first.pack("U*").squeeze(" ").tr("/’\"", "_''")
# remove "\n" until it matches our criteria
title = header.split("\n").slice_before { |line| !is_title?(line) }.first.map(&:strip).join(" | ")
if is_title? title
return title
else
STDERR.puts "couldnt find a valid title for file '#{File.basename file}' -- best match is '#{title}' /// #{file}"
end
end
`find "/Users/freakhill/Dropbox/learn/" -name "*.pdf"`.split("\n").each do |file|
begin
(title = find_title file) or next
newfile = File.join(File.dirname(file),"#{title}.pdf")
next if File.exists?(newfile)
`mv "#{file}" "#{newfile}"`
puts "#{File.basename file} => '#{title}' PROCESSED!"
rescue Exception => e
puts "failed on #{file} => #{e}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment