The goal of this gist is to show how to use a CentOS7 system (with root access), to create a static compiled binary which can be copied over to, and used on, a CentOS7 system (with no root access).
Question: Why would we want to do this?
Answer: In some cases you might want to use tesseract on a machine via a cloud provider. For security reasons, specific machines on a specific cloud provider's infrastructure will not allow root access to the remove guest (you).
Solution: What we are about to do is log into a CentOS7 machine where we do have root access (this can be any machine i.e. a VM on your local machine etc.). We install all of the dependencies and create sufficuent executables on the machine with root access. We then copy just the tesseract executable over to the machine with no root access and make sure that the executables are in the no root access machine's path. Voilà!
For this task you can use your own CentOS machine etc.
sudo yum install zlib
sudo yum install zlib-devel
sudo yum install libjpeg
sudo yum install libjpeg-devel
sudo yum install libwebp
sudo yum install libwebp-devel
sudo yum install libtiff
sudo yum install libtiff-devel
sudo yum install libpng
sudo yum install libpng-devel
cd /home/asureuser
git clone https://github.com/DanBloomberg/leptonica.git --depth 1
cd /home/azureuser/leptonica
./autogen.sh
./configure --prefix=/usr/local --disable-shared --enable-static --with-zlib --with-jpeg --with-libwebp --with-libtiff --with-libpng
make
sudo make install
sudo ldconfig
wget https://github.com/tesseract-ocr/tesseract/archive/4.0.0.tar.gz -O tesseract-4.0.0.tar.gz
tar xvvfz tesseract-4.0.0.tar.gz
cd tesseract-4.0.0
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
./autogen.sh
./configure --prefix=/usr/local --disable-shared --enable-static --with-extra-libraries=/usr/local/lib/
make
sudo make install
sudo ldconfig
First, copy the tesseract binary (from the machine with root access) to the non-root Centos7 machine.
cd /home/azureuser/tess
scp -i ~/.ssh/key.pem -rp [email protected]:/usr/local/bin/tesseract .
Then ensure that the tesseract binary is in the system path on the non-root machine by adding the export statement to the ~/.bash_profile file
export PATH="$PATH:/home/azureuser/tess"
Create a new location to store the traineddata and then export that location as the TESSDATA_PREFIX
mkdir -p /home/azureuser/tess/traineddata
export TESSDATA_PREFIX=/home/azureuser/tess/traineddata
Copy any of the .traineddata files available here that you need to the above TESSDATA_PREFIX location.
cd $TESSDATA_PREFIX
wget https://github.com/tesseract-ocr/tessdata/raw/master/fra.traineddata
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata
Grab an image such as [this french lunch menu example]( wget https://second-state.github.io/wasm-learning/faas/ocr/html/a_french_lunch_menu.png)
cd /home/azureuser/tess
wget https://second-state.github.io/wasm-learning/faas/ocr/html/a_french_lunch_menu.png
Then run the tesseract command and pass in the image as the first parameter
cd /home/azureuser/tess
./tesseract a_french_lunch_menu.png stdout --dpi 70 -l fra