To execute, simply run python challenge.py
. It should output a word and its occurrence count per line to stdout. Make sure the data file is in the same folder and it's called data.txt
. (Tested on Ubuntu 18.04 with Python 3.6.6 and 2.7)
If by what I would do next to put in into production is meant how I could make it more robust, then:
- Parametrize the data file, for example by passing it as an argument or connect it to some sort of document storage.
- Use a much more robust way of tokenizing the input. For example using the NLKT (Natural Language Toolkit) library.
- Also use a better way of counting the words. Numpy has some nice histogram functions.