Lxml's current FAQ includes a method of transforming xml to a dict of dicts, but not JSON. The mismatch between a dict of dicts and JSON occurs when an element has mulitple children with the same tag name. Under JSON conventions, multiple children of the same name are equivalent to an array or tuple. The below python functions attempt to add the repeated tags to list functionality. I'd appreciate suggestions for improvements.
Last active
August 6, 2017 05:03
-
-
Save SKalt/9fdef848e4917538bd53a7d2368c1a9f to your computer and use it in GitHub Desktop.
Functions to transform xml to JSON-able dicts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def recursive_dict(element): | |
"Given an lxml.etree._Element, recursively transform its children to dicts structured as JSON" | |
if not len(element): | |
return element.text | |
else: | |
results = {} | |
for child in element: | |
if results.get(child.tag, False): | |
if type(results[child.tag]) != list: | |
results[child.tag] = [results[child.tag]] | |
results[child.tag].append(recursive_dict(child)) | |
else: | |
results[child.tag] = recursive_dict(child) | |
return results |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def tojson(element): | |
unique_child_tags = set([child.tag for child in element]) | |
results = {} | |
if not unique_child_tags: | |
return element.text | |
for tag in unique_child_tags: | |
children_with_tag = element.xpath(tag) | |
if len(children_with_tag) == 1: | |
results[tag] = tojson_2(children_with_tag[0]) | |
else: | |
results[tag] = [tojson_2(child) for child in children_with_tag] | |
return results |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For safety, I'd always spell that like this:
I would expect the second approach to be much slower than the first, but generally speaking, I don't think there is a one-size-fits-all conversion. If the XML format is not intended to be JSON conforming by design, users would probably end up applying one format quirk fix or the other at some point. Giving them an example is already the best we can do.