Created
August 24, 2014 12:09
-
-
Save amake/5aa1c1774a9df4b49d91 to your computer and use it in GitHub Desktop.
Journey to the West (西游记) character count
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
''' | |
Count the number of characters in Journey to the West | |
''' | |
import urllib2 | |
URL = 'http://www.sdmz.net/xy/%03d.htm' | |
CHAPTERS = 100 | |
def do_count(): | |
chars = set() | |
for n in xrange(1, CHAPTERS + 1): | |
print 'Chapter', n | |
chars = chars.union(urllib2.urlopen(URL % n).read().decode('gb2312', 'ignore')) | |
print len(chars) | |
if __name__ == '__main__': | |
do_count() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment