Skip to content

Instantly share code, notes, and snippets.

@mjbommar
Created July 21, 2009 17:12
Show Gist options
  • Save mjbommar/151466 to your computer and use it in GitHub Desktop.
Save mjbommar/151466 to your computer and use it in GitHub Desktop.
'''
@author: Michael Bommarito
@contact [email protected]
@date Jul 21, 2009
'''
"""
# Go to the NFL website and find the page that lists all teams: http://www.nfl.com/teams/
# Pick your favorite team and select the team roster.
# Now, pick a few of your favorite players and check out their profile page.
# Do you notice any patterns in the data or structure on each player's page?
# Pay special attention to the URL for each player's profile page. Do you notice any patterns at the end of the URL?
# Describe the URL pattern in words. Are there a certain number of letters or numbers in any particular order?
id=CAR356737
id=COU714650
id=GAN308500
id=JOH338168
id=AAA000000
These are like identifiers for each person. There's an equation that takes their real name and creates the "digital name."
Regular expressions are a simple way to extract patterns from text if they can be described like this.
Regular Expression: id=([A-Z0-9]+)
id= text that precedes
[A-Z] match a letter A through Z
[0-9] match a number 0 through 9
[A-Z0-9] match a letter A through Z or a number 0 through 9
[A-Z0-9]+ match one or more instances of a number of letters
(...) i want to keep this part of the text
"""
# re is the module that provides support for regular expression.
# 'import re' is the command to make the module available to your program.
import re
# This is an example string with three real URLs.
exampleText = 'http://www.nfl.com/players/tomzbikowski/profile?id=ZBI355964 http://www.nfl.com/players/stefanrodgers/profile?id=ROD526034 http://www.nfl.com/players/dawanlandry/profile?id=LAN144473'
# This line creates the regular expression finder.
idFinder = re.compile('id=([A-Z0-9]+)')
# This line tells the regular expression to extract the unique identifiers for each URL.
print idFinder.findall(exampleText)
"""
You should see the following output:
['ZBI355964', 'ROD526034', 'LAN144473']
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment