Skip to content

Instantly share code, notes, and snippets.

@NateWeiler
Last active February 9, 2021 08:25
Show Gist options
  • Save NateWeiler/11af609047eece8f90396af5942b9c28 to your computer and use it in GitHub Desktop.
Save NateWeiler/11af609047eece8f90396af5942b9c28 to your computer and use it in GitHub Desktop.
Extract href tag values (hyperlinks) from a webpage.
#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://example.com/example.html")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
print link.get('href')
#!/usr/bin/python
#-*- coding: utf-8 -*-
# usage ./find_hyperlinks.py "https://example.com/example.txt"
import os
import sys
import wget
from BeautifulSoup import BeautifulSoup
sys.setdefaultencoding('UTF8')
url = sys.argv[1]
filename = os.path.basename(url)
soup = BeautifulSoup(filename)
for tag in soup.findAll('a', href=True):
print(str(tag['href']))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment