Skip to content

Instantly share code, notes, and snippets.

@Midnighter
Created August 18, 2019 20:32
Show Gist options
  • Save Midnighter/c33a883ac988598b8219d652adb07df7 to your computer and use it in GitHub Desktop.
Save Midnighter/c33a883ac988598b8219d652adb07df7 to your computer and use it in GitHub Desktop.
Benchmark different unicode-content parsing expressions using pyparsing.
# Copyright 2011-2019 Moritz Emanuel Beber
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""Benchmark different content parsing expressions."""
import pytest
import pyparsing as pp
ROUNDS = 300
ITERATIONS = 10
alpha = pp.OneOrMore(pp.Word(pp.printables + "°", excludeChars="#<>;(){}"))
unicode = pp.OneOrMore(
pp.Word(pp.pyparsing_unicode.printables, excludeChars="#<>;(){}")
)
not_in = pp.CharsNotIn("#<>;(){}")
regex = pp.OneOrMore(pp.Regex(r"[^#<>;(){}\s]+"))
@pytest.fixture()
def text():
return "2,3-dihydro-2,3-dihydroxybenzoate dehydrogenase 30°C"
def content_alpha(string):
return alpha.parseString(string, parseAll=True)
def content_unicode(string):
return unicode.parseString(string, parseAll=True)
def content_not_in(string):
return not_in.parseString(string, parseAll=True)
def content_regex(string):
return regex.parseString(string, parseAll=True)
def test_content_alpha(benchmark, text):
benchmark.pedantic(
content_alpha,
kwargs={"string": text},
rounds=ROUNDS,
iterations=ITERATIONS
)
def test_content_unicode(benchmark, text):
benchmark.pedantic(
content_unicode,
kwargs={"string": text},
rounds=ROUNDS,
iterations=ITERATIONS
)
def test_content_not_in(benchmark, text):
benchmark.pedantic(
content_not_in,
kwargs={"string": text},
rounds=ROUNDS,
iterations=ITERATIONS
)
def test_content_regex(benchmark, text):
benchmark.pedantic(
content_regex,
kwargs={"string": text},
rounds=ROUNDS,
iterations=ITERATIONS
)
@Midnighter
Copy link
Author

Midnighter commented Aug 18, 2019

Commands run:

pip install pytest pytest-benchmark
pytest test_benchmark_pyparsing.py
pytest==5.0.1
pytest-benchmark==3.2.2

Sample output:

-------------------------------------------------------------------------------------------- benchmark: 4 tests --------------------------------------------------------------------------------------------
Name (time in us)               Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_content_not_in         33.9296 (1.0)         58.5272 (1.0)         37.5185 (1.0)        3.7466 (1.0)         36.5777 (1.0)        2.9685 (1.0)         30;19  26,653.5212 (1.0)         300          10
test_content_alpha          48.7261 (1.44)       129.6191 (2.21)        53.7750 (1.43)       8.2717 (2.21)        52.7443 (1.44)       3.7144 (1.25)        13;21  18,595.9858 (0.70)        300          10
test_content_regex          49.6090 (1.46)        88.2356 (1.51)        53.9659 (1.44)       4.9056 (1.31)        52.1994 (1.43)       3.5889 (1.21)        24;17  18,530.2035 (0.70)        300          10
test_content_unicode     3,636.2350 (107.17)   5,708.6024 (97.54)    3,866.0735 (103.04)   257.6094 (68.76)    3,804.0245 (104.00)   195.2260 (65.77)       14;12     258.6604 (0.01)        300          10
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment