Skip to content

Instantly share code, notes, and snippets.

@bokutin
Created May 11, 2012 12:27
Show Gist options
  • Save bokutin/2659323 to your computer and use it in GitHub Desktop.
Save bokutin/2659323 to your computer and use it in GitHub Desktop.
ScanSnapのPDFファイルがOCR済みかどうかチェックする
#!/usr/bin/env perl
use strict;
use feature ":5.10";
use CAM::PDF;
use Encode;
use Encode::UTF8Mac;
use IO::All;
$|=1;
binmode STDOUT, ":utf8";
my $dir = io("/Users/bokutin/Pictures/ScanSnap/");
my @pdf = grep { $_->filename =~ m/\.pdf$/i } $dir->all_files;
for my $file (@pdf) {
my $fn = $file->pathname;
my $pdf = CAM::PDF->new($fn);
my $has_text = 0;
for my $page_num ( 1 .. 10 ) {
if ( $pdf->getPageText($page_num) ) {
$has_text = 1;
last;
}
}
say sprintf("%s, %s", decode('utf-8-mac', $file->filename), $has_text ? "YES" : "NO");
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment