Last active
May 16, 2024 04:59
-
-
Save dr-dimitru/9317130 to your computer and use it in GitHub Desktop.
HTML MINIFY RegEx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$re = '%# Collapse whitespace everywhere but in blacklisted elements. | |
(?> # Match all whitespans other than single space. | |
[^\S ]\s* # Either one [\t\r\n\f\v] and zero or more ws, | |
| \s{2,} # or two or more consecutive-any-whitespace. | |
) # Note: The remaining regex consumes no text at all... | |
(?= # Ensure we are not in a blacklist tag. | |
[^<]*+ # Either zero or more non-"<" {normal*} | |
(?: # Begin {(special normal*)*} construct | |
< # or a < starting a non-blacklist tag. | |
(?!/?(?:textarea|pre|script)\b) | |
[^<]*+ # more non-"<" {normal*} | |
)*+ # Finish "unrolling-the-loop" | |
(?: # Begin alternation group. | |
< # Either a blacklist start tag. | |
(?>textarea|pre|script)\b | |
| \z # or end of file. | |
) # End alternation group. | |
) # If we made it here, we are not in a blacklist tag. | |
%Six'; | |
$minified = preg_replace($re, " ", $html_to_minify); |
Here the same in JavaScript (ECMAScript Standard) format. Formatted and with explanation.
([^\S ]\s*| \s{2,})(?=[^<]*(?:<(?!\/?(?:textarea|pre|script)\b)[^<]*)*(?:<(textarea|pre|script)\b| \z))
( # Match all whitespans other than single space.
[^\S ]\s* # Either one [\t\r\n\f\v] and zero or more ws,
| \s{2,} # or two or more consecutive-any-whitespace.
) # Note: The remaining regex consumes no text at all...
(?= # Ensure we are not in a blacklist tag.
[^<]* # Either zero or more non-"<" {normal*}
(?: # Begin {(special normal*)*} construct
< # or a < starting a non-blacklist tag.
(?!\/?(?:textarea|pre|script)\b)
[^<]* # more non-"<" {normal*}
)* # Finish "unrolling-the-loop"
(?: # Begin alternation group.
< # Either a blacklist start tag.
(textarea|pre|script)\b
| \z # or end of file.
) # End alternation group.
) # If we made it here, we are not in a blacklist tag.
Also, you forgot a backslash. Here the fixed PHP (PCRE2) version.
(?>[^\S ]\s*| \s{2,})(?=[^<]*+(?:<(?!\/?(?:textarea|pre|script)\b)[^<]*+)*+(?:<(?>textarea|pre|script)\b| \z))
How well does this work? Did you test It?
@pmaojo was working well for long time. yes tested it in had in production
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
very slow on large amount of HTML