Created
November 27, 2012 23:12
-
-
Save IamNaN/4157864 to your computer and use it in GitHub Desktop.
Regex to parse HTML attribute/value pairs.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This regex parses attributes from their values such as those in HTML elements. It returns the attribute | |
names and their values even when the quotes are escaped, nested, or omitted. | |
The following are examples attribute/value pairs that are properly divided: | |
a="a" b="b b" c='c' d=1 e="escaped \" quotes" f="'nested quotes'" g = 'gaps' h="multiple spaces" | |
The attribute name will be in match position 0, while the value will be in either position 4 or 5 | |
depending on whether or not the value is quoted. | |
For unquoted values (such as attribute d above) match position 4 will be blank and the value will be | |
in position 5. Otherwise, the value will be in position 3. This could be normalized with some | |
additional work but would make the expression complicated for my needs. | |
(\w*) *= *((['"])?((\\\3|[^\3])*?)\3|(\w+)) |
Here's the regex at play:
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thought you would have shown example result. Like assume “a” - “h” are all in one tag.