Created
March 31, 2017 21:30
-
-
Save derwiki/bd57eb155076b973116fee8f24a178ca to your computer and use it in GitHub Desktop.
OCRing PDFs using Ghostscript and Google Cloud Vision
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ for x in 100 200 225 250 300 ; do echo $x; gs -sDEVICE=jpeg -DBATCH -dNOPAUSE -r$x -sOutputFile=warren.jpg -dLastPage=1 -dFirstPage=1 warren.pdf 1>/dev/null ; jpeginfo warren.jpg; done | |
100 | |
warren.jpg 856 x 1400 24bit JFIF N 80332 | |
200 | |
warren.jpg 1712 x 2800 24bit JFIF N 240411 | |
225 | |
warren.jpg 1926 x 3150 24bit JFIF N 284315 | |
250 | |
warren.jpg 2140 x 3500 24bit JFIF N 337588 | |
300 | |
warren.jpg 2568 x 4200 24bit JFIF N 454876 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment