Skip to content

Instantly share code, notes, and snippets.

@kristianrl
kristianrl / pdf-ocr-to-txt.sh
Last active July 10, 2024 09:38
Extract OCR contents in PDF-documents as plain text (.txt)
# Extract OCR contents in PDF-documents as plain text (.txt)
# Kristian Risager Larsen, 2024-07
#
# Setup:
# You need to install GhostsScript and Tesseract
# brew install tesseract tesseract-lang ghostscript
#
# Notes:
# The "-l dan" parameter tells Tesseract to expect Danish text