-
-
Save JayWood/348752b568ecd63ae5ce to your computer and use it in GitHub Desktop.
I think this is much better solution:
libxml_use_internal_errors(true);
$dom = new \DOMDocument;
$dom->loadHTML($string);
// Strip wrapping <html> and <body> tags
$mock = new \DOMDocument;
$body = $dom->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child) {
$mock->appendChild($mock->importNode($child, true));
}
$fixed = trim($mock->saveHTML());
Test
<code>test<code
-> <code>test<code></code></code> // Not perfect but at least it's valid
<code>test
-> <code>test</code>
@gplcart Thankyou for your solution, the first one is amazing but it doesn't colse the broken tags like <span
, just the complete tags.
thank you, very much, you have helped me a lot
@gplcart how do you handle special characters such as a slanted quotes and apostrophes?
https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html
libxml seems to output them jarbled.
EDIT: found that mb_convert_encoding()
can help us out here: http://php.net/manual/en/domdocument.loadhtml.php#74777
This does not seem to work using
<strong>Some string</strong> here <strong...
It return this for me:
<strong>Some string</strong> here <strong...< h3="">
(when previous string is h3 for example - this is what browser adds?)
Also @scott-thrillist can you post an example of what you did to make it utf-8 / weird-chars compatible? I would also prefer not to use regex solution if at all possible. Your solution gives this, which is a bit better but still not ok I reckon:
<strong>Some string</strong> here <strong...></strong...>
Works perfectly, thanks for the post.
This will close tags that don't need closing, eg <img src="">
... so if you have an image followed by a div
, it'll insert </img>
after the div
.
... but otherwise it seems quite good! Thanks!
You saved my time
Thank you very much!
@JayWood Code is great! But it's not closing h tags such as h1, h2, h3, h4, h5 and h6.
Here are updated code.
function closetags($html) {
preg_match_all('#<([a-zA-Z0-9]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
$openedtags = $result[1];
preg_match_all('#</([a-zA-Z0-9]+)>#iU', $html, $result);
$closedtags = $result[1];
$len_opened = count($openedtags);
if (count($closedtags) == $len_opened) {
return $html;
}
$openedtags = array_reverse($openedtags);
for ($i=0; $i < $len_opened; $i++) {
if (!in_array($openedtags[$i], $closedtags)) {
$html .= '</'.$openedtags[$i].'>';
} else {
unset($closedtags[array_search($openedtags[$i], $closedtags)]);
}
}
return $html;
}
Great!
Wow, that's really cool. Thanks for this.