Skip to content

Instantly share code, notes, and snippets.

@Turupawn
Created September 17, 2019 14:56
Show Gist options
  • Save Turupawn/6954358e9753a1ddfd776fe174bcac85 to your computer and use it in GitHub Desktop.
Save Turupawn/6954358e9753a1ddfd776fe174bcac85 to your computer and use it in GitHub Desktop.
#!/bin/bash
FILE=./input.doc
if test -f "$FILE"; then
lowriter --convert-to pdf:writer_pdf_Export input.doc
fi
mkdir image
pdftoppm input.pdf image/page -png
mkdir text
page=1
for image_file in ./image/*.png; do
additional_zeros=""
if [ $page -lt 10 ]
then
additional_zeros="00"
elif [ $page -lt 100 ]
then
additional_zeros="0"
fi
tesseract $image_file "./text/"$additional_zeros$page -l spa
page=$((page+1))
done
for text_file in ./text/*.txt; do
cat $text_file >> output.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment