-
-
Save ryuheechul/9c876058410ceb37af9eb0765a16e26d to your computer and use it in GitHub Desktop.
Benchmark for Vim regexp engine performance
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Regular expressions and data from | |
http://lh3lh3.users.sourceforge.net/reb.shtml | |
Regular expressions benchmarked: | |
URI ([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)? | |
Email ([^ @]+)@([^ @]+) | |
Date ([0-9][0-9]?)/([0-9][0-9]?)/([0-9][0-9]([0-9][0-9])?) | |
URI|Email ([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+) | |
Word .*SCSI- | |
Results (in seconds): | |
URI Email Date Sum3 URI|Email Word | |
re=1 16.34 13.65 4.07 34.06 29.46 0.49 | |
re=2 92.03 9.75 4.47 106.25 105.39 5.22 | |
Python 2.7.3 2.69 5.17 1.01 8.87 7.72 3.40 | |
Perl 5.14.2 0.35 0.33 0.32 1.00 8.12 0.31 | |
GNU egrep 2.10 0.21 0.16 0.56 0.93 10.86 0.03 | |
(Five runs each, Vim 7.3.1010, 64-bit i7-2700K CPU @ 3.50GHz x 8.) | |
The Vim results were obtained with the bench.sh script. | |
Python, Perl, and egrep were timed in similar fashion using these invocations: | |
perl script.pl 'pattern' </path/to/data/howto >/dev/null | |
python script.py 'pattern' </path/to/data/howto >/dev/null | |
egrep 'pattern' /path/to/data/howto >/dev/null | |
The data file "howto" (~38M) is available at | |
http://people.unipmn.it/manzini/lightweight/corpus/howto.bz2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Usage: ./bench.sh <engine> <script> | |
# where engine is (1|2) | |
# script is (uri|email|date|uriemail|word) | |
VIM="/path/to/vim/src/vim" | |
DATA="/path/to/data/howto" | |
vimrc="vimrc-${1:-1}" | |
rescript="re-${2:-word}.vim" | |
cmd=( "${VIM}" -N -u "${vimrc}" -i NONE -n -e -s -S "${rescript}" +quit "${DATA}" ) | |
echo "${cmd[@]}" >&2 | |
tmpfile="/tmp/,,tmp.$$" | |
for i in {1..5}; do | |
\time -f '%e' -ao "${tmpfile}" "${cmd[@]}" &>/dev/null | |
echo -n . >&2 | |
done | |
echo >&2 | |
result=$( awk '{ sum += $1 } END { printf "%.2f", sum / 5 }' "${tmpfile}" ) | |
rm -f "${tmpfile}" | |
echo "${result}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
g/\%([0-9][0-9]\=\)\/\%([0-9][0-9]\=\)\/\%([0-9][0-9]\%([0-9][0-9]\)\=\)/p |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
g/\%([^ @]\+\)@\%([^ @]\+\)/p |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
g/\%([a-zA-Z][a-zA-Z0-9]*\):\/\/\%([^ /]\+\)\%(\/[^ ]*\)\=/p |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
g/\%([a-zA-Z][a-zA-Z0-9]*\):\/\/\%([^ /]\+\)\%(\/[^ ]*\)\=\|\%([^ @]\+\)@\%([^ @]\+\)/p |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
g/.*SCSI-/p |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env perl | |
use strict; | |
use warnings; | |
my $reobj = qr/$ARGV[0]/; | |
while (<STDIN>) { | |
print $_ if /$reobj/; | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
import re | |
import sys | |
reobj = re.compile(sys.argv[1]) | |
for line in sys.stdin: | |
if reobj.search(line): | |
sys.stdout.write(line) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
if exists('®expengine') | |
set regexpengine=1 | |
endif |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
if exists('®expengine') | |
set regexpengine=2 | |
endif |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment