Created
July 4, 2013 05:46
-
-
Save whosaysni/5925182 to your computer and use it in GitHub Desktop.
Forcing ElementTree to serialize extra namespaces
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# coding: utf-8 | |
""" | |
To parse XML with namespaces using ElementTree, keeping namespace prefix, you will need register_namespace. | |
>>> from xml.etree.ElementTree import register_namespace, fromstring, tostring | |
>>> base_ns = { | |
... 'http://example.com/article': 'article', | |
... 'http://example.com/report': 'report', | |
... 'http://example.com/book': 'book', | |
... } | |
>>> for uri, prefix in base_ns.items(): | |
... register_namespace(prefix, uri) | |
>>> | |
Now, you may parse XML and serialize back to string with proper namespace. | |
>>> src = '<docs xmlns:article=\"http://example.com/article\" xmlns:report=\"http://example.com/report\" xmlns:book=\"http://example.com/book\"><article><article:section>foo</article:section></article><report><report:chapter>bar</report:chapter></report></docs>' | |
>>> elem = fromstring(src) | |
>>> tostring(elem) | |
'<docs xmlns:article="http://example.com/article" xmlns:report="http://example.com/report">...</docs>' | |
>>> | |
However, you may notice, there is a caveat: even source XML contains full namespace declaration, ElementTree drops part of them if no relevant node are used. On above, xmlns:book is omitted because there's no "book:" prefixed node. | |
To force xmlns declarations (not used in a document) are serialized, tweak namespace by yourself and call _serialize_xml directly. | |
>>> from xml.etree.ElementTree import _serialize_xml, _namespaces | |
>>> qnames, namespaces = _namespaces(elem, 'utf-8') # ... this contains the "used" namespaces. | |
>>> namespaces | |
{'http://example.com/report': 'report', 'http://example.com/article': 'article'} | |
>>> namespaces.update(base_ns) | |
>>> from StringIO import StringIO | |
>>> buf = StringIO() | |
>>> _serialize_xml(buf.write, elem, 'utf-8', qnames, namespaces) | |
>>> buf.getvalue() | |
'<docs xmlns:article="http://example.com/article" xmlns:book="http://example.com/book" xmlns:report="http://example.com/report">...</docs>' | |
""" | |
from doctest import testmod, ELLIPSIS | |
testmod(optionflags=ELLIPSIS) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment