Skip to content

Instantly share code, notes, and snippets.

@mundry
Last active December 31, 2015 11:08
Show Gist options
  • Save mundry/7977281 to your computer and use it in GitHub Desktop.
Save mundry/7977281 to your computer and use it in GitHub Desktop.
List the tags used in a (set of) HTML file(s).
#!/usr/bin/env python2.7
from sys import argv
from bs4 import BeautifulSoup
from bs4.element import Tag
soup = None
tags = set()
for html_file in argv[1:]:
with open(html_file) as fin:
soup = BeautifulSoup(fin.read())
for child in soup.find('body').descendants:
if isinstance(child, Tag):
tags.add(child.name)
print '\n'.join(sorted(tags))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment