Skip to content

Instantly share code, notes, and snippets.

@enachb
Created April 15, 2018 00:05
Show Gist options
  • Select an option

  • Save enachb/1c9af19fc2ad5bdbdd48171d27331b49 to your computer and use it in GitHub Desktop.

Select an option

Save enachb/1c9af19fc2ad5bdbdd48171d27331b49 to your computer and use it in GitHub Desktop.
#!/bin/bash
OUT_DIR=/mnt/scans
TMP_DIR=`mktemp -d`
FILE_NAME=scan_`date +%Y-%m-%d-%H%M%S`
LANGUAGE="eng" # the tesseract language - ensure you installed it
echo 'scanning...'
scanimage --resolution 300 \
--batch="$TMP_DIR/scan_%03d.pnm" \
--format=pnm \
--mode Color \
--source 'ADF Duplex'
echo "Output saved in $TMP_DIR/scan*.pnm"
cd $TMP_DIR
for i in scan_*.pnm; do
echo "${i}"
convert "${i}" "${i}.tif"
done
# do OCR
echo 'doing OCR...'
for i in scan_*.tif; do
echo "${i}"
tesseract "$i" "$i" -l $LANGUAGE hocr
hocr2pdf -i "$i" -s -o "$i.pdf" < "$i.hocr"
done
# create PDF
echo 'creating PDF...'
pdftk *.tif.pdf cat output "compiled.pdf"
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dB
ATCH -sOutputFile="$FILE_NAME.pdf" compiled.pdf
cp $FILE_NAME.pdf $OUT_DIR/
rm -rf $TMP_DIR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment