Skip to content

Instantly share code, notes, and snippets.

@victormurcia
Created September 5, 2022 04:28
Show Gist options
  • Select an option

  • Save victormurcia/5121186f1c59f10bca2c3c0ed816f1d0 to your computer and use it in GitHub Desktop.

Select an option

Save victormurcia/5121186f1c59f10bca2c3c0ed816f1d0 to your computer and use it in GitHub Desktop.
decode downloaded html and extract all <a href=""> links
# decode downloaded html and extract all <a href=""> links
def get_urls_from_html(content):
# decode the provided content as ascii text
html = content.decode('utf-8')
# parse the document as best we can
soup = BeautifulSoup(html, 'html.parser')
# find all all of the <a href=""> tags in the document
atags = soup.find_all('a')
# get all links from a tags
return [tag.get('href') for tag in atags]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment