Leptonica from: http://www.leptonica.org/download.html
Tesseract from: https://code.google.com/p/tesseract-ocr/
from root folder after extracting with tar -xzf .tgz
tar xzf leptonica-1.71.tar.gz
cd leptonica-1.71
CFLAGS="-D__SOLARIS__" ./configure --prefix=/opt/local
make
make install
Install Tools
pkgin in autoconf automake libtool
./autogen.sh
LIBLEPT_HEADERSDIR="/opt/local/include/leptonica" CFLAGS="-D__SOLARIS__" ./configure --prefix="/opt/local"
Update Makefile manually (https://code.google.com/p/tesseract-ocr/issues/detail?id=915 https://code.google.com/p/tesseract-ocr/issues/detail?id=582)
Find LIBS = -lept
in ./Makefile
, ./api/Makefile
and ./training/Makefile
and change to
LIBS = -llept -lsocket -lnsl -lrt -lxnet
make
make install
Tesseract should now be in path.
Set environment variable for tesseract to find training data
export TESSDATA_PREFIX=/opt/local/share/tessdata
E.g:
wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz
wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.deu.tar.gz
wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.ara.tar.gz
tar xzfv tesseract-ocr-3.02.eng.tar.gz
tar xzfv tesseract-ocr-3.02.deu.tar.gz
tar xzfv tesseract-ocr-3.02.ara.tar.gz
Install Tessdata
cp tesseract-ocr/tessdata/eng* /opt/local/share/tessdata
cp tesseract-ocr/tessdata/fra* /opt/local/share/tessdata
cp tesseract-ocr/tessdata/deu* /opt/local/share/tessdata
cp tesseract-ocr/tessdata/ara* /opt/local/share/tessdata
Note:
fra
and some others are included in the english training data
Use Tesseract
Example on jpg (Png does not seem to work, possible missing a feature upstream)
tesseract test6-000005.jpg test6-000005 -l deu
Optionally you can add makebox
or hocr
to the command to create a box or hocr file containing the coordinates
of each recognized character.