Skip to content

Instantly share code, notes, and snippets.

@bukzor
Created May 29, 2012 16:35
Show Gist options
  • Save bukzor/2829431 to your computer and use it in GitHub Desktop.
Save bukzor/2829431 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python
import lxml.html.defs
lxml.html.defs.empty_tags = lxml.html.defs.empty_tags.union(['wbr'])
reload(lxml.html)
reload(lxml)
print '<wbr> is empty tag:', 'wbr' in lxml.html.defs.empty_tags
print
from lxml import etree
from cStringIO import StringIO
wbr_html = """\
<html>
<head>
<title>wbr test</title>
</head>
<body>
Test for a breakable<wbr>word implemenation change
</body>
</html>
"""
parser = etree.HTMLParser()
tree = etree.parse(StringIO(wbr_html), parser)
result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
if result.split() != wbr_html.split(): # split, as we are not interested in whitespace differences
print(result)
print("not ok")
else:
print("OK")
@bukzor
Copy link
Author

bukzor commented May 29, 2012

Output:

<wbr> is empty tag: True

<html>
<head><title>wbr test</title></head>
<body>
  Test for a breakable<wbr>word implemenation change
</wbr>
</body>
</html>

not ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment