Skip to content

Instantly share code, notes, and snippets.

@luisenriquecorona
Created December 23, 2019 18:08
Show Gist options
  • Save luisenriquecorona/6095f84a75787697e8ac2decafed7c6f to your computer and use it in GitHub Desktop.
Save luisenriquecorona/6095f84a75787697e8ac2decafed7c6f to your computer and use it in GitHub Desktop.
In line with the fail-fast philosophy, dict access with d[k] raises an error when k is not an existing key. Every Pythonista knows that d.get(k, default) is an alternative to d[k] whenever a default value is more convenient than handling KeyError.
import sys
import re
WORD_RE = re.compile('\w+')
index = {}
with open(sys.argv[1], encoding='utf-8') as fp:
for line_no, line in enumerate(fp, 1):
for match in WORD_RE.finditer(line):
word = match.group()
column_no = match.start()+1
location = (line_no, column_no)
# this is ugly; coded like this to make a point
occurrences = index.get(word, [])
occurrences.append(location)
index[word] = occurrences
# print in alphabetical order
for word in sorted(index, key=str.upper):
print(word, index[word])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment