Skip to content

Instantly share code, notes, and snippets.

@tlehman
Created December 14, 2011 00:04
Show Gist options
  • Save tlehman/1474562 to your computer and use it in GitHub Desktop.
Save tlehman/1474562 to your computer and use it in GitHub Desktop.
imgscraper
# a simple image scraper by tlehman
# this code is too basic for me to care what you do with it, so have at it.
#
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
# usage: getimg(url, filetype)
# return will be list of src attributes of a tags in page
# referred to by url
def getimg(url, filetype):
# get html source from url
text = urlopen(url).read()
# parse the html source using BeautifulSoup
soup = BeautifulSoup(text)
# set of image urls to be returned
imgurls = set()
for img in soup.findAll('img'):
s=str(img['src'])
if s[len(s)-3:] == filetype:
imgurls.add(s)
return list(imgurls)
@tlehman
Copy link
Author

tlehman commented Dec 14, 2011

This used to be a github repository, but it is too small, so I made it into a gist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment