malev/text_extractor.md

Created September 2, 2014 21:37

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/malev/946f83cbb928e044ab63.js"></script>
Save malev/946f83cbb928e044ab63 to your computer and use it in GitHub Desktop.

Download ZIP

Text Extraction

Raw

text_extractor.md

TextExtractor

Requirements

Works with doc, odt and pdf
Works through an API
Can handle multiple files at the same time
Uses queues (maybe distributed)
It's doable
Works fast!

What else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment