Skip to content

Instantly share code, notes, and snippets.

@mdobson
Created March 31, 2012 16:58
Show Gist options
  • Save mdobson/2266751 to your computer and use it in GitHub Desktop.
Save mdobson/2266751 to your computer and use it in GitHub Desktop.
Read web page
import redis
import requests
from bs4 import BeautifulSoup
#create redis connection
redis_server = redis.Redis("localhost")
#read a link from the redis index
requrl = redis_server.lindex("index",2)
#parse up the content
r = requests.get(requrl)
soup = BeautifulSoup(r.text)
#holds all the page qualifying data
qualifier = {}
#for all the content in the page
for content in soup.find_all('p'):
words = content.get_text().split()
#add to dictionary. If it doesnt exist add it with an instance of 1 else increment it
for word in words:
if not word in qualifier:
qualifier[word] = 1
else:
qualifier[word] += 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment