Skip to content

Instantly share code, notes, and snippets.

@kimmel
Created August 26, 2012 18:10
Show Gist options
  • Save kimmel/3482230 to your computer and use it in GitHub Desktop.
Save kimmel/3482230 to your computer and use it in GitHub Desktop.
python HTMLParser regexp
locatestarttagend = re.compile(r"""
<[a-zA-Z][-.a-zA-Z0-9:_]* # tag name
(?:[\s/]* # optional whitespace before attribute name
(?:(?<=['"\s/])[^\s/>][^\s/=>]* # attribute name
(?:\s*=+\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|"[^"]*" # LIT-enclosed value
|(?!['"])[^>\s]* # bare value
)
)?(?:\s|/(?!>))*
)*
)?
\s* # trailing whitespace
""", re.VERBOSE)
endendtag = re.compile('>')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment