Skip to content

Instantly share code, notes, and snippets.

@hhatto
Last active May 23, 2024 01:08
Show Gist options
  • Select an option

  • Save hhatto/25896e29edbedfbe057e4b79b71ad1b2 to your computer and use it in GitHub Desktop.

Select an option

Save hhatto/25896e29edbedfbe057e4b79b71ad1b2 to your computer and use it in GitHub Desktop.
benchmark of Python's XML parsing packages
from xml.etree import ElementTree
from lxml.etree import XML
import xmltodict
import untangle
from benchmarker import Benchmarker
N = 1000 * 1
N = 1000 * 100
xml_string = """<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-11-02</publish_date>
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems
of being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
"""
root = ElementTree.fromstring(xml_string)
for child in root:
print(child.tag, child.attrib, child.find("title").text)
root = XML(xml_string)
for child in root:
print(child.tag, child.attrib, child.find("title").text)
root = xmltodict.parse(xml_string)
for key, obj in root["catalog"].items():
for o in obj:
print(key, o["@id"], o["title"])
root = untangle.parse(xml_string)
for child in root.catalog:
for book in child.book:
print(book["id"], book.title.cdata)
with Benchmarker(N, width=20) as bench:
@bench("xml.etree")
def _(bm):
for i in bm:
root = ElementTree.fromstring(xml_string)
for child in root:
_tag = child.tag
_attrib = child.attrib
_title = child.find("title").text
@bench("lxml")
def _(bm):
for i in bm:
root = XML(xml_string)
for child in root:
_tag = child.tag
_attrib = child.attrib
_title = child.find("title").text
@bench("xmltodict")
def _(bm):
for i in bm:
root = xmltodict.parse(xml_string)
for key, obj in root["catalog"].items():
for o in obj:
_tag = key
_attrib = o["@id"]
_title = o["title"]
@bench("untangle")
def _(bm):
for i in bm:
root = untangle.parse(xml_string)
for child in root.catalog:
for book in child.book:
_attrib = book["id"]
_title = book.title.cdata
@hhatto
Copy link
Author

hhatto commented May 7, 2019

> python xmlbench.py
book {'id': 'bk101'} XML Developer's Guide
book {'id': 'bk102'} Midnight Rain
book {'id': 'bk103'} Maeve Ascendant
book {'id': 'bk104'} Oberon's Legacy
book {'id': 'bk105'} The Sundered Grail
book {'id': 'bk106'} Lover Birds
book {'id': 'bk107'} Splish Splash
book {'id': 'bk108'} Creepy Crawlies
book {'id': 'bk109'} Paradox Lost
book {'id': 'bk110'} Microsoft .NET: The Programming Bible
book {'id': 'bk111'} MSXML3: A Comprehensive Guide
book {'id': 'bk112'} Visual Studio 7: A Comprehensive Guide
book {'id': 'bk101'} XML Developer's Guide
book {'id': 'bk102'} Midnight Rain
book {'id': 'bk103'} Maeve Ascendant
book {'id': 'bk104'} Oberon's Legacy
book {'id': 'bk105'} The Sundered Grail
book {'id': 'bk106'} Lover Birds
book {'id': 'bk107'} Splish Splash
book {'id': 'bk108'} Creepy Crawlies
book {'id': 'bk109'} Paradox Lost
book {'id': 'bk110'} Microsoft .NET: The Programming Bible
book {'id': 'bk111'} MSXML3: A Comprehensive Guide
book {'id': 'bk112'} Visual Studio 7: A Comprehensive Guide
book bk101 XML Developer's Guide
book bk102 Midnight Rain
book bk103 Maeve Ascendant
book bk104 Oberon's Legacy
book bk105 The Sundered Grail
book bk106 Lover Birds
book bk107 Splish Splash
book bk108 Creepy Crawlies
book bk109 Paradox Lost
book bk110 Microsoft .NET: The Programming Bible
book bk111 MSXML3: A Comprehensive Guide
book bk112 Visual Studio 7: A Comprehensive Guide
bk101 XML Developer's Guide
bk102 Midnight Rain
bk103 Maeve Ascendant
bk104 Oberon's Legacy
bk105 The Sundered Grail
bk106 Lover Birds
bk107 Splish Splash
bk108 Creepy Crawlies
bk109 Paradox Lost
bk110 Microsoft .NET: The Programming Bible
bk111 MSXML3: A Comprehensive Guide
bk112 Visual Studio 7: A Comprehensive Guide
## benchmarker:         release 4.0.1 (for python)
## python version:      3.7.3
## python compiler:     Clang 10.0.1 (clang-1001.0.46.3)
## python platform:     Darwin-18.5.0-x86_64-i386-64bit
## python executable:   /Users/hattorihideo/.virtualenvs/py373/bin/python
## cpu model:           Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
## parameters:          loop=100000, cycle=1, extra=0

##                        real    (total    = user    + sys)
xml.etree               6.7043    6.7000    6.7000    0.0000
lxml                   10.6899   10.6900   10.6800    0.0100
xmltodict              45.7616   45.7400   45.7200    0.0200
untangle               48.7579   48.7300   48.5700    0.1600

## Ranking                real
xml.etree               6.7043  (100.0) ********************
lxml                   10.6899  ( 62.7) *************
xmltodict              45.7616  ( 14.7) ***
untangle               48.7579  ( 13.8) ***

## Matrix                 real    [01]    [02]    [03]    [04]
[01] xml.etree          6.7043   100.0   159.4   682.6   727.3
[02] lxml              10.6899    62.7   100.0   428.1   456.1
[03] xmltodict         45.7616    14.7    23.4   100.0   106.5
[04] untangle          48.7579    13.8    21.9    93.9   100.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment