mjbommar · July 21, 2009 17:12
diff --git a/Harambee1.py b/Harambee1.py
 '''
 @author: Michael Bommarito
 @contact  [email protected]
 @date Jul 21, 2009
 '''

 """
 # Go to the NFL website and find the page that lists all teams: http://www.nfl.com/teams/
 # Pick your favorite team and select the team roster.
 # Now, pick a few of your favorite players and check out their profile page.
 # Do you notice any patterns in the data or structure on each player's page?
 # Pay special attention to the URL for each player's profile page.  Do you notice any patterns at the end of the URL?
 # Describe the URL pattern in words.  Are there a certain number of letters or numbers in any particular order?

 id=CAR356737
 id=COU714650
 id=GAN308500
 id=JOH338168

 id=AAA000000

 These are like identifiers for each person.  There's an equation that takes their real name and creates the "digital name."
 

 Regular expressions are a simple way to extract patterns from text if they can be described like this.
 Regular Expression: id=([A-Z0-9]+)
                    id=            text that precedes
                    [A-Z]           match a letter A through Z
                    [0-9]            match a number 0 through 9
                    [A-Z0-9]        match a letter A through Z or a number 0 through 9
                    [A-Z0-9]+       match one or more instances of a number of letters
                    (...)           i want to keep this part of the text

 """

 # re is the module that provides support for regular expression.
 # 'import re' is the command to make the module available to your program.
 import re

 # This is an example string with three real URLs.
 exampleText = 'http://www.nfl.com/players/tomzbikowski/profile?id=ZBI355964 http://www.nfl.com/players/stefanrodgers/profile?id=ROD526034 http://www.nfl.com/players/dawanlandry/profile?id=LAN144473'

 # This line creates the regular expression finder.  
 idFinder = re.compile('id=([A-Z0-9]+)')

 # This line tells the regular expression to extract the unique identifiers for each URL.
 print idFinder.findall(exampleText)

 """
 You should see the following output:
 ['ZBI355964', 'ROD526034', 'LAN144473']
 """
	'''
	@author: Michael Bommarito
	@contact [email protected]
	@date Jul 21, 2009
	'''

	"""
	# Go to the NFL website and find the page that lists all teams: http://www.nfl.com/teams/
	# Pick your favorite team and select the team roster.
	# Now, pick a few of your favorite players and check out their profile page.
	# Do you notice any patterns in the data or structure on each player's page?
	# Pay special attention to the URL for each player's profile page. Do you notice any patterns at the end of the URL?
	# Describe the URL pattern in words. Are there a certain number of letters or numbers in any particular order?

	id=CAR356737
	id=COU714650
	id=GAN308500
	id=JOH338168

	id=AAA000000

	These are like identifiers for each person. There's an equation that takes their real name and creates the "digital name."


	Regular expressions are a simple way to extract patterns from text if they can be described like this.
	Regular Expression: id=([A-Z0-9]+)
	id= text that precedes
	[A-Z] match a letter A through Z
	[0-9] match a number 0 through 9
	[A-Z0-9] match a letter A through Z or a number 0 through 9
	[A-Z0-9]+ match one or more instances of a number of letters
	(...) i want to keep this part of the text

	"""

	# re is the module that provides support for regular expression.
	# 'import re' is the command to make the module available to your program.
	import re

	# This is an example string with three real URLs.
	exampleText = 'http://www.nfl.com/players/tomzbikowski/profile?id=ZBI355964 http://www.nfl.com/players/stefanrodgers/profile?id=ROD526034 http://www.nfl.com/players/dawanlandry/profile?id=LAN144473'

	# This line creates the regular expression finder.
	idFinder = re.compile('id=([A-Z0-9]+)')

	# This line tells the regular expression to extract the unique identifiers for each URL.
	print idFinder.findall(exampleText)

	"""
	You should see the following output:
	['ZBI355964', 'ROD526034', 'LAN144473']
	"""