Skip to content

Instantly share code, notes, and snippets.

@xsthunder
Created February 27, 2019 03:28
Show Gist options
  • Save xsthunder/ffd4858008eb129c9e23cdd9c7dade0d to your computer and use it in GitHub Desktop.
Save xsthunder/ffd4858008eb129c9e23cdd9c7dade0d to your computer and use it in GitHub Desktop.
parse html or xml into dom
# see ()[https://stackoverflow.com/a/40749716]
from xml.dom.minidom import parseString
html_string = """
<!DOCTYPE html>
<html><head><title>title</title></head><body><p>test</p></body></html>
"""
# extract the text value of the document's <p> tag:
doc = parseString(html_string)
paragraph = doc.getElementsByTagName("p")[0]
content = paragraph.firstChild.data
print(content)
# This would raise an exception on common HTML entities such as &nbsp; or &reg;.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment