Created
August 16, 2016 04:19
-
-
Save manichabba/c042e9b242a8c772529454ddab39222d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""The program will use urllib to read the HTML from the data files below, | |
extract the href= vaues from the anchor tags, | |
scan for a tag that is in a particular position from the top and follow that link, | |
repeat the process a number of times, and report the last name you find.""" | |
import re | |
import urllib | |
from BeautifulSoup import * | |
url = raw_input('Enter URL:') | |
pos = int (raw_input("Enter position:")) | |
count = int(raw_input("Enter count:")) | |
for i in range(count): | |
tags = BeautifulSoup(urllib.urlopen(url).read())('a') | |
url = tags[pos-1].get('href', None) | |
y = re.findall('by_([^.]*)', url) | |
print "Retrieved " + str(i+1) + ":" + url | |
print "The last name is " + y[0] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment