Skip to content

Instantly share code, notes, and snippets.

@kimmel
Created August 26, 2012 18:09
Show Gist options
  • Save kimmel/3482220 to your computer and use it in GitHub Desktop.
Save kimmel/3482220 to your computer and use it in GitHub Desktop.
python Beautiful Soup regexp 2
# Methods for supporting CSS selectors.
tag_name_re = re.compile('^[a-z0-9]+$')
# /^(\w+)\[(\w+)([=~\|\^\$\*]?)=?"?([^\]"]*)"?\]$/
# \---/ \---/\-------------/ \-------/
# | | | |
# | | | The value
# | | ~,|,^,$,* or =
# | Attribute
# Tag
attribselect_re = re.compile(
r'^(?P<tag>\w+)?\[(?P<attribute>\w+)(?P<operator>[=~\|\^\$\*]?)' +
r'=?"?(?P<value>[^\]"]*)"?\]$'
...
locatestarttagend = re.compile(r"""
<[a-zA-Z][-.a-zA-Z0-9:_]* # tag name
(?:\s+ # whitespace before attribute name
(?:[a-zA-Z_][-.:a-zA-Z0-9_]* # attribute name
(?:\s*=\s* # value indicator
(?:'[^']*' # LITA-enclosed value
|\"[^\"]*\" # LIT-enclosed value
|[^'\">\s]+ # bare value
)
)?
)
)*
\s* # trailing whitespace
""", re.VERBOSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment