Created
December 23, 2019 18:08
-
-
Save luisenriquecorona/6095f84a75787697e8ac2decafed7c6f to your computer and use it in GitHub Desktop.
In line with the fail-fast philosophy, dict access with d[k] raises an error when k is not an existing key. Every Pythonista knows that d.get(k, default) is an alternative to d[k] whenever a default value is more convenient than handling KeyError.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
import re | |
WORD_RE = re.compile('\w+') | |
index = {} | |
with open(sys.argv[1], encoding='utf-8') as fp: | |
for line_no, line in enumerate(fp, 1): | |
for match in WORD_RE.finditer(line): | |
word = match.group() | |
column_no = match.start()+1 | |
location = (line_no, column_no) | |
# this is ugly; coded like this to make a point | |
occurrences = index.get(word, []) | |
occurrences.append(location) | |
index[word] = occurrences | |
# print in alphabetical order | |
for word in sorted(index, key=str.upper): | |
print(word, index[word]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment