This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| (192, 'the') | |
| (105, 'i') | |
| (74, 'to') | |
| (71, 'was') | |
| (67, 'of') | |
| (62, 'in') | |
| (53, 'a') | |
| (52, 'and') | |
| (50, 'you') | |
| (50, 'he') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| stopwords = ['a', 'about', 'above', 'across', 'after', 'afterwards'] | |
| stopwords += ['again', 'against', 'all', 'almost', 'alone', 'along'] | |
| stopwords += ['already', 'also', 'although', 'always', 'am', 'among'] | |
| stopwords += ['amongst', 'amoungst', 'amount', 'an', 'and', 'another'] | |
| stopwords += ['any', 'anyhow', 'anyone', 'anything', 'anyway', 'anywhere'] | |
| stopwords += ['are', 'around', 'as', 'at', 'back', 'be', 'became'] | |
| stopwords += ['because', 'become', 'becomes', 'becoming', 'been'] | |
| stopwords += ['before', 'beforehand', 'behind', 'being', 'below'] | |
| stopwords += ['beside', 'besides', 'between', 'beyond', 'bill', 'both'] | |
| stopwords += ['bottom', 'but', 'by', 'call', 'can', 'cannot', 'cant'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Given a list of words, remove any that are | |
| # in a list of stop words. | |
| def removeStopwords(wordlist, stopwords): | |
| return [w for w in wordlist if w not in stopwords] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # html-to-freq-2.py | |
| import urllib2 | |
| import obo | |
| url = 'http://www.oldbaileyonline.org/print.jsp?div=t17800628-33' | |
| response = urllib2.urlopen(url) | |
| html = response.read() | |
| text = obo.stripTags(html).lower() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| (25, 'house') | |
| (20, 'yes') | |
| (20, 'prisoner') | |
| (19, 'mr') | |
| (17, 'man') | |
| (15, 'akerman') | |
| (14, 'mob') | |
| (13, 'black') | |
| (12, 'night') | |
| (11, 'saw') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # write-html.py | |
| f = open('helloworld.html','w') | |
| message = """<html> | |
| <head></head> | |
| <body><p>Hello World!</p></body> | |
| </html>""" | |
| f.write(message) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # write-html-2.py | |
| import webbrowser | |
| f = open('helloworld.html','w') | |
| message = """<html> | |
| <head></head> | |
| <body><p>Hello World!</p></body> | |
| </html>""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # write-html-2.py | |
| import webbrowser | |
| f = open('helloworld.html','w') | |
| message = """<html> | |
| <head></head> | |
| <body><p>Hello World!</p></body> | |
| </html>""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| frame = 'This smells like a %s' | |
| print frame | |
| -> This smells like a %s | |
| print frame % 'banana' | |
| -> This smells like a banana | |
| print frame % 'pear' | |
| -> This smells like a pear |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| frame2 = 'These are %s, those are %s' | |
| print frame2 | |
| -> These are %s, those are %s | |
| print frame2 % ('bananas', 'pears') | |
| -> These are bananas, those are pears |