Skip to content

Instantly share code, notes, and snippets.

@henrik
Created March 3, 2012 17:07
Show Gist options
  • Save henrik/1967035 to your computer and use it in GitHub Desktop.
Save henrik/1967035 to your computer and use it in GitHub Desktop.
OCR on OS X with tesseract

Install ImageMagick for image conversion:

brew install imagemagick

Install tesseract for OCR:

brew install tesseract --all-languages

Or install without --all-languages and install them manually as needed.

Make sure the input image is a grayscale .tif and fairly large. ~500x150 was too small, while ~2000*500 worked very well.

convert input.png -resize 400% -type Grayscale input.tif

OCR it. The default language is English. Language codes are 3 chars per man tesseract.

tesseract -l eng input.tif output

This creates output.txt.

@varenc
Copy link

varenc commented Feb 6, 2019

@gabedot

Homebrew recently decided to remove all options from the homebrew-core Formula's. Though as of right now tesseract now includes all languages by default so just remove the option and you should get all languages. This makes tesseract 680MB by default though so think this should change in the future.

In the medium to short term, you can install tesseract with all language support with this
brew install https://github.com/Homebrew/homebrew-core/raw/10708da5492fa4da6fbf2618210681953219409f/Formula/tesseract.rb though that's just a reference to a particular version of the Formula so won't receive future updates.

@Yokileforever
Copy link

why my execute brew install tesseract --all-languages is Error: invalid option: --all-languages ?

@Anima-t3d
Copy link

why my execute brew install tesseract --all-languages is Error: invalid option: --all-languages ?

Homebrew recently decided to remove all options from the homebrew-core Formula's as explained by @varenc

@bejvisek
Copy link

bejvisek commented May 22, 2019

Hi, I have installed tessaract 4.0.0 smoothly by
brew install tesseract
but have only these languages available:
$ tesseract --list-langs List of available languages (3): eng osd snum
No above mention option like --with-all-languates does not work anymore :-/

Is there a way to install selected language(s)? Thanks!

@bejvisek
Copy link

Answering my own question:
brew install tesseract-lang
It installs most of the languages, but not all listed here: link
I need to install language "equ", still don't know how

@abdennour
Copy link

thanks @bejvisek

@JamesAsuraA93
Copy link

MacOS user must use "brew install tesseract-lang" instant "brew install tesseract --all-languages"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment