Skip to content

Instantly share code, notes, and snippets.

@sbp
Created February 27, 2011 15:14
Show Gist options
  • Select an option

  • Save sbp/846253 to your computer and use it in GitHub Desktop.

Select an option

Save sbp/846253 to your computer and use it in GitHub Desktop.
Parsing extreme HTML as HTML5
>>> import html5lib
>>> f = open('extreme.html')
>>> doc = html5lib.parse(f)
>>> for element in doc:
... print element
...
<html>
<head>
<body>
<meta-start>
<None>
<row>
<None>
<cell>
<None>
<None>
<cell>
<None>
<None>
<table>
<None>
<None>
<corr>
<None>
<g>
<None>
<None>
<None>
<font>
<None>
<p>
<font>
<None>
<None>
<sup>
<None>
<sub>
<None>
<None>
<script>
<None>
<None>
<table>
<tbody>
<tr>
<th>
<None>
<nr>
<None>
<meta>
<meta>
<meta>
<meta>
<None>
<pre>
<None>
<None>
<test>
<None>
<span>
<None>
<None>
<p>
<None>
<html:p>
<p>
<None>
<None>
<None>
<None>
<None>
<None>
>>>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment