The deckdown regex parser should be able to break up a document by headers. For example this input html fragment:
<h1>Section</h1>
<p>Section Info</p>
<h2>Section</h2>
<p>Section Info</p>
<h1>Section<h1>
<p>Section</p>
<ul>
<li>Something</li>
<li>Something</li>
<ul>
Should yield this array of html fragments:
[
'<h1>Section</h1><p>Section Info</p>',
'<h2>Section</h2><p>Section Info</p>',
'<h1>Section<h1><p>Section</p><ul><li>Something</li><li>Something</li><ul>'
]