TextExtractor Requirements Works with doc, odt and pdf Works through an API Can handle multiple files at the same time Uses queues (maybe distributed) It's doable Works fast! What else?