Last active
March 24, 2016 13:58
-
-
Save dumpmycode/904c8f4f2056c11b1da3 to your computer and use it in GitHub Desktop.
Google Python Class - mimic.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python -tt | |
# Copyright 2010 Google Inc. | |
# Licensed under the Apache License, Version 2.0 | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# Google's Python Class | |
# http://code.google.com/edu/languages/google-python-class/ | |
"""Mimic pyquick exercise -- optional extra exercise. | |
Google's Python Class | |
Read in the file specified on the command line. | |
Do a simple split() on whitespace to obtain all the words in the file. | |
Rather than read the file line by line, it's easier to read | |
it into one giant string and split it once. | |
Build a "mimic" dict that maps each word that appears in the file | |
to a list of all the words that immediately follow that word in the file. | |
The list of words can be be in any order and should include | |
duplicates. So for example the key "and" might have the list | |
["then", "best", "then", "after", ...] listing | |
all the words which came after "and" in the text. | |
We'll say that the empty string is what comes before | |
the first word in the file. | |
With the mimic dict, it's fairly easy to emit random | |
text that mimics the original. Print a word, then look | |
up what words might come next and pick one at random as | |
the next work. | |
Use the empty string as the first word to prime things. | |
If we ever get stuck with a word that is not in the dict, | |
go back to the empty string to keep things moving. | |
Note: the standard python module 'random' includes a | |
random.choice(list) method which picks a random element | |
from a non-empty list. | |
For fun, feed your program to itself as input. | |
Could work on getting it to put in linebreaks around 70 | |
columns, so the output looks better. | |
""" | |
import random | |
import sys | |
def mimic_dict(filename): | |
"""Returns mimic dict mapping each word to list of words which follow it.""" | |
dic ={} | |
key = '' | |
with open(filename) as fobj: | |
wordlist = fobj.read().lower().split() | |
for word in wordlist: | |
if key not in dic: | |
dic[key] = [word] | |
else: | |
dic[key] += [word] | |
key = word | |
return dic | |
def print_mimic(mimic_dict, word): | |
""" | |
Given mimic dict and start word, prints 200 random words. | |
paragraphed roughly every 70 characters. | |
to do: feed function output to itself and still maintain | |
70 paragraph format. | |
""" | |
strout = [] | |
start = 0 | |
end = 70 | |
for i in xrange(200): | |
strout.append(word) | |
word = random.choice(mimic_dict[word]) | |
strout = ' '.join(strout) | |
while True: | |
try: | |
if strout[end] == ' ': | |
print(strout[start:end]) | |
start = end | |
end += 70 | |
else: | |
end += 1 | |
except: | |
print(strout[start:]) | |
break | |
# Provided main(), calls mimic_dict() and mimic() | |
def main(): | |
if len(sys.argv) != 2: | |
print 'usage: ./mimic.py file-to-read' | |
sys.exit(1) | |
dict = mimic_dict(sys.argv[1]) | |
print_mimic(dict, '') | |
if __name__ == '__main__': | |
main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment