Skip to content

Instantly share code, notes, and snippets.

@lefth
lefth / ocrpdf.sh
Last active November 3, 2021 15:04 — forked from wcaleb/ocrpdf.sh
Take a PDF, OCR it, and add OCR Text as background layer to original PDF to make it searchable
#!/bin/bash
# NOTE: I recommend pdfsandwich instead of this script, partly because imagemagick (and pdftoppm) fail on large detailed images.
# While that technique does not preserve the original graphics, it can come close.
# To preserve color:
# pdfsandwich -rgb input.pdf
# To preserve grey tones:
# pdfsandwich -gray input.pdf
# To disable all preprocessing:
# pdfsandwich -nopreproc input.pdf