Last active
December 19, 2015 02:19
-
-
Save freakhill/5882543 to your computer and use it in GitHub Desktop.
rename pdf files i downloader around the internet
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env ruby | |
# -*- coding: utf-8 -*- | |
# easy to break use at your own risk | |
# this script renames pdf files in a folder (and subfoler) (on unixy machines) | |
# with a naive heuristic to find the title (that unfortunately is never in the *title* metadata) | |
require 'pdf-reader' | |
def is_title? str | |
str =~ /^[\w\s:,\(\)-\.\|\?#]+$/ | |
end | |
def find_title file | |
ln = "\n".codepoints.first | |
# look for "\n\n" pattern | |
reader = PDF::Reader.new file | |
last_codepoint = nil | |
header = reader.page(1).text.each_codepoint.slice_before do |cp| | |
lc = last_codepoint | |
last_codepoint = cp | |
[cp, lc] == [ln, ln] | |
end.first.pack("U*").squeeze(" ").tr("/’\"", "_''") | |
# remove "\n" until it matches our criteria | |
title = header.split("\n").slice_before { |line| !is_title?(line) }.first.map(&:strip).join(" | ") | |
if is_title? title | |
return title | |
else | |
STDERR.puts "couldnt find a valid title for file '#{File.basename file}' -- best match is '#{title}' /// #{file}" | |
end | |
end | |
`find "/Users/freakhill/Dropbox/learn/" -name "*.pdf"`.split("\n").each do |file| | |
begin | |
(title = find_title file) or next | |
newfile = File.join(File.dirname(file),"#{title}.pdf") | |
next if File.exists?(newfile) | |
`mv "#{file}" "#{newfile}"` | |
puts "#{File.basename file} => '#{title}' PROCESSED!" | |
rescue Exception => e | |
puts "failed on #{file} => #{e}" | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment