Creating valid HTML documents that can never be parsed

While watching Andreas Klings video on building an HTML parser, I noticed, that a piece of HTML can be constructed, which will never terminate when parsed.

The document.write() keyword:

When parsing an HTML document, scripts have to be executed and their document.write() output has to be parsed as well. This allows us, js to insert html into a document at parse-time, such as ... Hello <script>document.write(localStorage.getItem('username'))</script> ... to print greetings. The spec allows for document.write to write out any html tags, even <script>, which has to be executed again as per the spec. You might see where this is going...

This can now be used, similar to a zip bomb or xml bomb, to create documents which will never finish parsing (or to grow in size indefinitely, similar to other modern web apps). See the example below:

<html>
	<head>
		<title>what</title>
	</head>
	<body>
	<script>document.write(document.getElementsByTagName("script")[0].outerHTML)</script>
	</body>
</html>

In the real world, all browsers tested are actually not conforming to the spec and stop after 22 tags (so 21 newly generated ones).

AntonLydike/non-terminating-html.md

Creating valid HTML documents that can never be parsed