Last active
January 13, 2022 21:49
-
-
Save mgsisk/1094230 to your computer and use it in GitHub Desktop.
[HTML Regex] Regular expression pattern for matching HTML tags.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<(?(?=!--)!--[\s\S]*--|(?(?=\?)\?[\s\S]*\?|(?(?=\/)\/[^.\d-][^\/\]'"[!#$%&()*+,;<=>?@^`{|}~ ]*|[^.\d-][^\/\]'"[!#$%&()*+,;<=>?@^`{|}~ ]*(?:\s[^.\d-][^\/\]'"[!#$%&()*+,;<=>?@^`{|}~ ]*(?:=(?:"[^"]*"|'[^']*'|[^'"<\s]*))?)*)\s?\/?))> | |
< # Tags always begin with <. | |
(? # What if... | |
(?=!--) # We have a comment? | |
!--[\s\S]*-- # If so, anything goes between <!-- and -->. | |
| # OR | |
(? # What if... | |
(?=\?) # We have a scripting tag? | |
\?[\s\S]*\? # If so, anything goes between <? and ?>. | |
| # OR | |
(? # What if... | |
(?=\/) # We have a closing tag? | |
\/ # It should begin with a /. | |
[^.\d-] # Then the tag name, which can't begin with any of these characters. | |
[^\/\]'"[!#$%&()*+,;<=>?@^`{|}~ ]* # And can't contain any of these characters. | |
| # OR... we must have some other tag. | |
[^.\d-] # Tag names can't begin with these characters. | |
[^\/\]'"[!#$%&()*+,;<=>?@^`{|}~ ]* # And can't contain any of these characters. | |
(?: # Do we have any attributes? | |
\s # If so, they'll begin with a space character. | |
[^.\d-] # Followed by a name that doesn't begin with any of these characters. | |
[^\/\]'"[!#$%&()*+,;<=>?@^`{|}~ ]* # And doesn't contain any of these characters. | |
(?: # Does our attribute have a value? | |
= # If so, the value will begin with an = sign. | |
(?: # The value could be: | |
"[^"]*" # Wrapped in double quotes. | |
| # OR | |
'[^']*' # Wrapped in single quotes. | |
| # OR | |
[^'"<\s]* # Not wrapped in anything. | |
) # That does it for our attribute value. | |
)? # If the attribute is boolean it won't need a value. | |
)* # We could have any number of attributes. | |
) # That does it for our closing vs other tag check. | |
\s? # There could be some space characters before the closing >. | |
\/? # There might also be a / if this is a self-closing tag. | |
) # That does it for our script vs html tag check. | |
) # That does it for our comment vs script tag check. | |
> # Tags always end with a >. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment