Skip to content

Instantly share code, notes, and snippets.

@juanriaza
Created April 21, 2014 08:17
Show Gist options
  • Save juanriaza/11135970 to your computer and use it in GitHub Desktop.
Save juanriaza/11135970 to your computer and use it in GitHub Desktop.
import requests
from lxml.html import fromstring, iterlinks
# https://github.com/zacharyvoase/urlobject/
from urlobject import URLObject
url = 'http://juanriaza.com'
req = requests.get(url, verify=False)
html_tree = fromstring(req.content)
html_tree.make_links_absolute(url, resolve_base_href=True)
for link in iterlinks(html_tree):
print link
print URLObject(link[2]).hostname
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment