Skip to content

Instantly share code, notes, and snippets.

Created July 21, 2009 17:12
Show Gist options
  • Save mjbommar/151466 to your computer and use it in GitHub Desktop.
Save mjbommar/151466 to your computer and use it in GitHub Desktop.
@author: Michael Bommarito
@contact [email protected]
@date Jul 21, 2009
# Go to the NFL website and find the page that lists all teams:
# Pick your favorite team and select the team roster.
# Now, pick a few of your favorite players and check out their profile page.
# Do you notice any patterns in the data or structure on each player's page?
# Pay special attention to the URL for each player's profile page. Do you notice any patterns at the end of the URL?
# Describe the URL pattern in words. Are there a certain number of letters or numbers in any particular order?
These are like identifiers for each person. There's an equation that takes their real name and creates the "digital name."
Regular expressions are a simple way to extract patterns from text if they can be described like this.
Regular Expression: id=([A-Z0-9]+)
id= text that precedes
[A-Z] match a letter A through Z
[0-9] match a number 0 through 9
[A-Z0-9] match a letter A through Z or a number 0 through 9
[A-Z0-9]+ match one or more instances of a number of letters
(...) i want to keep this part of the text
# re is the module that provides support for regular expression.
# 'import re' is the command to make the module available to your program.
import re
# This is an example string with three real URLs.
exampleText = ''
# This line creates the regular expression finder.
idFinder = re.compile('id=([A-Z0-9]+)')
# This line tells the regular expression to extract the unique identifiers for each URL.
print idFinder.findall(exampleText)
You should see the following output:
['ZBI355964', 'ROD526034', 'LAN144473']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment