Skip to content

Instantly share code, notes, and snippets.

@rabipelais
Created October 18, 2018 21:19
Show Gist options
  • Save rabipelais/195fb17dedea21a7ed47e5ec1b0c2c11 to your computer and use it in GitHub Desktop.
Save rabipelais/195fb17dedea21a7ed47e5ec1b0c2c11 to your computer and use it in GitHub Desktop.
Sebastian Mendez answer to Fineway's coding task.

Sebastian Mendez' solution to the Fineway short challenge

To execute, simply run python challenge.py. It should output a word and its occurrence count per line to stdout. Make sure the data file is in the same folder and it's called data.txt. (Tested on Ubuntu 18.04 with Python 3.6.6 and 2.7)

If by what I would do next to put in into production is meant how I could make it more robust, then:

  • Parametrize the data file, for example by passing it as an argument or connect it to some sort of document storage.
  • Use a much more robust way of tokenizing the input. For example using the NLKT (Natural Language Toolkit) library.
  • Also use a better way of counting the words. Numpy has some nice histogram functions.
import re
from collections import defaultdict
def split_into_words(whole_thing):
# [\w] is an alphanumeric char
return re.sub("[^\w]", " ", whole_thing).split()
def count_words(words):
#Dictionary where the default value is 0
words_count = defaultdict(lambda: 0)
for w in words:
words_count[w.lower()] += 1
return words_count
if __name__ == '__main__':
with open('data.txt') as f:
words = split_into_words(f.read())
words_count = count_words(words)
#Sort the dict by value (occurrences), from larger to smaller
for k,v in sorted(words_count.items(), key=lambda kv: kv[1], reverse=True):
print("%s (%d)" % (k, v))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment