Skip to content

Instantly share code, notes, and snippets.

@fadur
Created February 28, 2011 19:37
Show Gist options
  • Save fadur/847879 to your computer and use it in GitHub Desktop.
Save fadur/847879 to your computer and use it in GitHub Desktop.
Strips html tags from a given webpage
import re
import urllib
"""
strip html tags from a webpage
need's regex to remove js too
"""
url = ''
def strip_tags(url):
link = urllib.urlopen(url)
data = link.read()
stripped = re.compile(r'<[^<]*?/?>')
content = stripped.sub('', data)
return content
print strip_tags(url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment