Skip to content

Instantly share code, notes, and snippets.

@ultrafunkamsterdam
Last active November 14, 2019 11:20
Show Gist options
  • Save ultrafunkamsterdam/a4c7e12a658a26113e48f2655670ea09 to your computer and use it in GitHub Desktop.
Save ultrafunkamsterdam/a4c7e12a658a26113e48f2655670ea09 to your computer and use it in GitHub Desktop.
Print indented html tag structure from BeautifulSoup tag
"""
Print indented html tag structure from Beautifulsoup tag
example:
>>> soup = bs4.BeautifulSoup(some_html, 'html.parser') # or 'lxml'
>>> make_structure(soup)
out:
<html>
<head>
<title>
</title>
<script>
</script>
<script>
</script>
<script>
</script>
<script>
</script>
<script>
</script>
</head>
<body>
<div>
<nav>
<div>
<div>
<a>
</a>
<div>
</div>
<div>
<div>
<span>
</span>
</div>
<div>
<a>
</a>
"""
import bs4
def make_structure(tag:bs4.Tag, i=0):
try:
children = list(tag.children)
except:
children = []
if len(children):
i += 1
print(' ' * i, '<' + tag.name + '>')
for t in children:
make_structure(t, i)
print(' ' * i, '</' + tag.name + '>')
else:
i -= 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment