Skip to content

Instantly share code, notes, and snippets.

@dr-dimitru
Last active May 16, 2024 04:59
Show Gist options
  • Save dr-dimitru/9317130 to your computer and use it in GitHub Desktop.
Save dr-dimitru/9317130 to your computer and use it in GitHub Desktop.
HTML MINIFY RegEx
$re = '%# Collapse whitespace everywhere but in blacklisted elements.
(?> # Match all whitespans other than single space.
[^\S ]\s* # Either one [\t\r\n\f\v] and zero or more ws,
| \s{2,} # or two or more consecutive-any-whitespace.
) # Note: The remaining regex consumes no text at all...
(?= # Ensure we are not in a blacklist tag.
[^<]*+ # Either zero or more non-"<" {normal*}
(?: # Begin {(special normal*)*} construct
< # or a < starting a non-blacklist tag.
(?!/?(?:textarea|pre|script)\b)
[^<]*+ # more non-"<" {normal*}
)*+ # Finish "unrolling-the-loop"
(?: # Begin alternation group.
< # Either a blacklist start tag.
(?>textarea|pre|script)\b
| \z # or end of file.
) # End alternation group.
) # If we made it here, we are not in a blacklist tag.
%Six';
$minified = preg_replace($re, " ", $html_to_minify);
@deniamnet
Copy link

very slow on large amount of HTML

@Letsmoe
Copy link

Letsmoe commented Aug 6, 2021

Here the same in JavaScript (ECMAScript Standard) format. Formatted and with explanation.

([^\S ]\s*| \s{2,})(?=[^<]*(?:<(?!\/?(?:textarea|pre|script)\b)[^<]*)*(?:<(textarea|pre|script)\b| \z))
(             # Match all whitespans other than single space.
[^\S ]\s*     # Either one [\t\r\n\f\v] and zero or more ws,
| \s{2,}        # or two or more consecutive-any-whitespace.
) # Note: The remaining regex consumes no text at all...
(?=             # Ensure we are not in a blacklist tag.
[^<]*        # Either zero or more non-"<" {normal*}
(?:           # Begin {(special normal*)*} construct
<           # or a < starting a non-blacklist tag.
(?!\/?(?:textarea|pre|script)\b)
[^<]*      # more non-"<" {normal*}
)*           # Finish "unrolling-the-loop"
(?:           # Begin alternation group.
<           # Either a blacklist start tag.
(textarea|pre|script)\b
	| \z          # or end of file.
)             # End alternation group.
)  # If we made it here, we are not in a blacklist tag.

Also, you forgot a backslash. Here the fixed PHP (PCRE2) version.

(?>[^\S ]\s*| \s{2,})(?=[^<]*+(?:<(?!\/?(?:textarea|pre|script)\b)[^<]*+)*+(?:<(?>textarea|pre|script)\b| \z))

@pmaojo
Copy link

pmaojo commented May 8, 2023

How well does this work? Did you test It?

@dr-dimitru
Copy link
Author

@pmaojo was working well for long time. yes tested it in had in production

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment