-
-
Save alixaxel/5595151 to your computer and use it in GitHub Desktop.
<?php | |
/** | |
* The MIT License | |
* http://creativecommons.org/licenses/MIT/ | |
* | |
* Tidy Wrapper for HTML 5 Indentation | |
* Copyright (c) 2013 Alix Axel <[email protected]> | |
**/ | |
function Tidy5($string, $options = null, $encoding = 'utf8') | |
{ | |
if (extension_loaded('tidy') === true) | |
{ | |
$default = array | |
( | |
'anchor-as-name' => false, | |
'break-before-br' => true, | |
'char-encoding' => $encoding, | |
'decorate-inferred-ul' => false, | |
'doctype' => 'omit', | |
'drop-empty-paras' => false, | |
'drop-font-tags' => true, | |
'drop-proprietary-attributes' => false, | |
'force-output' => false, # might wanna set this to true if using user defined tags | |
'hide-comments' => false, | |
'indent' => true, | |
'indent-attributes' => false, | |
'indent-spaces' => 2, # might wanna set this to 0 to remove whitespace (except in pre-like tags) | |
'input-encoding' => $encoding, | |
'join-styles' => false, | |
'logical-emphasis' => false, | |
'merge-divs' => false, | |
'merge-spans' => false, | |
'new-blocklevel-tags' => 'article aside audio details dialog figcaption figure footer header hgroup menutidy nav section source summary track video', | |
'new-empty-tags' => 'command embed keygen source track wbr', | |
'new-inline-tags' => 'canvas command data datalist embed keygen mark meter output progress time wbr', | |
'newline' => 0, | |
'numeric-entities' => false, | |
'output-bom' => false, | |
'output-encoding' => $encoding, | |
'output-html' => true, | |
'preserve-entities' => true, | |
'quiet' => true, | |
'quote-ampersand' => true, | |
'quote-marks' => false, | |
'repeated-attributes' => 1, | |
'show-body-only' => true, | |
'show-warnings' => false, | |
'sort-attributes' => 1, | |
'tab-size' => 4, | |
'tidy-mark' => false, | |
'vertical-space' => true, | |
'wrap' => 0, | |
); | |
$doctype = $menu = null; | |
if ((strncasecmp($string, '<!DOCTYPE', 9) === 0) || (strncasecmp($string, '<html', 5) === 0)) | |
{ | |
$doctype = '<!DOCTYPE html>'; $options['show-body-only'] = false; | |
} | |
$options = (is_array($options) === true) ? array_merge($default, $options) : $default; | |
if (strpos($string, '<menu') !== false) | |
{ | |
$menu = array | |
( | |
'<menu' => '<menutidy', | |
'</menu' => '</menutidy', | |
); | |
} | |
if (isset($menu) === true) | |
{ | |
$string = str_replace(array_keys($menu), $menu, $string); | |
} | |
$string = tidy_repair_string($string, $options, $encoding); | |
if (empty($string) !== true) | |
{ | |
if (isset($menu) === true) | |
{ | |
$string = str_replace($menu, array_keys($menu), $string); | |
} | |
if (isset($doctype) === true) | |
{ | |
$string = $doctype . "\n" . $string; | |
} | |
return $string; | |
} | |
} | |
return false; | |
} |
Sorry for the delay but I didn't get any notification, don't know why!
Anyway, I don't post-process DOM with libxml, I have a wrapper around DOMDocument and SimpleXML to act as a convinient XPath selector and I have another one that purifies HTML. But none of them, (de-)indent HTML, which is the purpose of this helper (besides being able to work with HTML5). I tried doing the same with DOMDocument, but the results were very poor, I can't remember exactly but I think I recall problems with comments and self-closing tags. Like I said, can't remember exactly, but there had to be a reason for me to abandon that approach. This is mostly to hide where your partials are coming from (blind de-indentation with something like preg_replace does not respect the indentation within pre tags), if you don't care about that, this is mostly useless. =)
How about this
<!DOCTYPE html>
<head>
<title>test</title>
</head>
<body>
<a href=""><div>asas</div></a>
</body>
</html>
Would result
<!DOCTYPE html>
<html>
<head>
<title>
test
</title>
</head>
<body>
<a href=""></a>
<div>
asas
</div>
</body>
</html>
Did you find some problem with your eloquent DOM processing? I've heard that tidy works better with malformed HTML - but I've never had a problem vs PHP's libxml.