Skip to content

Instantly share code, notes, and snippets.

@catawbasam
Last active August 29, 2015 14:05
Show Gist options
  • Save catawbasam/28153e91774992d6482b to your computer and use it in GitHub Desktop.
Save catawbasam/28153e91774992d6482b to your computer and use it in GitHub Desktop.
Julia islower(Char) and islower(String) benchmarks: Base, utf8proc, PCRE
{
"metadata": {
"language": "Julia",
"name": "",
"signature": "sha256:679974b4916c36603648466accbc9c2838242618d92c33c0fcb81297a9e11a79"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Benchmark of islower() using Base and utf8proc-- UPDATE 2\n",
"Changes: \n",
"* added mojibake_islower(String) based on a C function provided by @stevengj\n",
"* Revised islower_utf8proc() to use a hacked version of category_code(c) which omits the line `cat == 0 ? 30 : cat`.\n",
"* Revised test of islower(Char) using shuffled array of Char. \n",
"* dropped PCRE as it does not work well with Char and libmojibake is the preferred path.\n",
"\n",
"### Updated Findings for test functions, as tested:\n",
"* For islower(Char) run against shuffled Char arrays, islower_base is just a hair faster than islower_utf8proc\n",
"* For islower(String), islower_utf8proc is 3-4x faster than islower_base\n",
"* The discrepency between islower(Char) and islower(String) is probably due to the inefficiency of the all() function used in the Base version of islower(String)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"versioninfo()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Julia Version 0.4.0-dev+148\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Commit 064b5c3 (2014-08-15 04:27 UTC)\n",
"Platform Info:\n",
" System: Linux (x86_64-redhat-linux)\n",
" CPU: Intel(R) Xeon(R) CPU W3690 @ 3.47GHz\n",
" WORD_SIZE: 64\n",
" BLAS: libopenblas (USE64BITINT NO_AFFINITY NEHALEM)\n",
" LAPACK: libopenblas\n",
" LIBM: libopenlibm\n",
" LLVM: libLLVM-3.3\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### test Chars"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"const c = 'p'\n",
"const ca = char(uint8(rand(48:122, 1400)))\n",
"ca[1:8]'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"1x8 Array{Char,2}:\n",
" '0' 'L' 'R' '1' 'd' 'e' '>' 'f'"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### test strings"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"const s8a= \"cherry\u03c0\" \n",
"const s8=s8a^2\n",
"dump(s8)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"UTF8String"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" \"cherry\u03c0cherry\u03c0\"\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"const s32 = utf32(s8)\n",
"const ss8=SubString(s8,1,endof(s8))\n",
"const rev8 = RevString(s8)\n",
"const rep8 = RepString(s8a, 2)\n",
"\n",
"ts = {s8, ss8, s32, rev8, rep8}"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"5-element Array{Any,1}:\n",
" \"cherry\u03c0cherry\u03c0\"\n",
" \"cherry\u03c0cherry\u03c0\"\n",
" \"cherry\u03c0cherry\u03c0\"\n",
" \"\u03c0yrrehc\u03c0yrrehc\"\n",
" \"cherry\u03c0cherry\u03c0\""
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"####longer test strings"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"const s8b = s8^100\n",
"typeof(s8b), length(s8b)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"(UTF8String,1400)"
]
}
],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"const s32b = utf32(s8b)\n",
"const ss8b=SubString(s8b,1,endof(s8b))\n",
"const rev8b = RevString(s8b)\n",
"const rep8b = RepString(s8, 100)\n",
"\n",
"tsb = {s8b, ss8b, s32b, rev8b, rep8b};"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### current base : accuracy\n",
"on Windows, islower(c) is inaccurate"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"isupper(c), islower(c) "
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
"(false,true)"
]
}
],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for s in ts\n",
" println( islower(s))\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"true"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"true\n",
"true\n",
"true\n",
"true\n"
]
}
],
"prompt_number": 18
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Alternatives and Benchmarks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Alternative 0: The incumbent\n",
"A for loop would be faster than all, but wouldn't address the accuracy problem on Windows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"islower_base(c::Char) = bool(ccall(:iswlower, Int32, (Cwchar_t,), c))\n",
"islower_base(s::String) = all(islower,s)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 19,
"text": [
"islower_base (generic function with 2 methods)"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"c, islower_base(c)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"('p',true)"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s8, islower_base(s8)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 21,
"text": [
"(\"cherry\u03c0cherry\u03c0\",true)"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Alternative: utf8proc / libmojibake"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# returns UTF8PROC_CATEGORY code in 1:30 giving Unicode category\n",
"function category_code_HACK(c)\n",
" c > 0x10FFFF && return 0x0000 # see utf8proc_get_property docs\n",
" cat = unsafe_load(ccall(:utf8proc_get_property, Ptr{Uint16}, (Int32,), c))\n",
" # note: utf8proc returns 0, not UTF8PROC_CATEGORY_CN, for unassigned c\n",
" #cat == 0 ? 30 : cat # not important for islower\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 75,
"text": [
"category_code_HACK (generic function with 1 method)"
]
}
],
"prompt_number": 75
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# utf8 category constants\n",
"const UTF8PROC_CATEGORY_LU = 1\n",
"const UTF8PROC_CATEGORY_LL = 2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 76,
"text": [
"2"
]
}
],
"prompt_number": 76
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"islower_utf8proc(c::Char) = (category_code_HACK(c)==UTF8PROC_CATEGORY_LL)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 77,
"text": [
"islower_utf8proc (generic function with 2 methods)"
]
}
],
"prompt_number": 77
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function islower_utf8proc(s::String)\n",
" for c in s\n",
" if category_code_HACK(c)!=UTF8PROC_CATEGORY_LL\n",
" return false\n",
" end\n",
" end\n",
" return true\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 78,
"text": [
"islower_utf8proc (generic function with 2 methods)"
]
}
],
"prompt_number": 78
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@time islower_utf8proc(c)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"elapsed time: 0."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"002072182 seconds (46160 bytes allocated)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 79,
"text": [
"true"
]
}
],
"prompt_number": 79
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for s in ts\n",
" println(islower_utf8proc(s8))\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"true\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"true\n",
"true\n",
"true\n",
"true\n"
]
}
],
"prompt_number": 80
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Alternative 2 for islower(String) -- mojibake_islower() from @stevengj\n",
"This is a C function that handles conversion from utf8 to Chars on the C side."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ccall((:mojibake_islower,\"/devel/asias/keithc/julia/deps/libmojibake/libmojibake\"),\n",
" Int, (Ptr{Uint8},), \"blAH\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 81,
"text": [
"0"
]
}
],
"prompt_number": 81
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ccall((:mojibake_islower,\"/devel/asias/keithc/julia/deps/libmojibake/libmojibake\"),\n",
"Int, (Ptr{Uint8},), \"blah\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 82,
"text": [
"1"
]
}
],
"prompt_number": 82
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"mojibake_islower(s::ByteString) = bool(ccall((:mojibake_islower,\"/devel/asias/keithc/julia/deps/libmojibake/libmojibake\"),\n",
"Int, (Ptr{Uint8},), s))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 83,
"text": [
"mojibake_islower (generic function with 1 method)"
]
}
],
"prompt_number": 83
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"mojibake_islower(\"\u0394uffer\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 84,
"text": [
"false"
]
}
],
"prompt_number": 84
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"mojibake_islower(\"\u03b4uffer\") "
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 85,
"text": [
"true"
]
}
],
"prompt_number": 85
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initial Time comparisons \n",
"* base wins on Char\n",
"* utf8proc beats base on UTF8String; `mojibake_islower()` is even better"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"CN=1000_000\n",
"N=1000\n",
"runs = 20"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 86,
"text": [
"20"
]
}
],
"prompt_number": 86
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function test_base(v,N)\n",
" isit = false\n",
" for i in 1:N\n",
" isit = islower_base(v) \n",
" end\n",
" return isit\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 87,
"text": [
"test_base (generic function with 1 method)"
]
}
],
"prompt_number": 87
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function test_utf8proc(v,N)\n",
" isit = false\n",
" for i in 1:N\n",
" isit = islower_utf8proc(v) \n",
" end\n",
" return isit\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 88,
"text": [
"test_utf8proc (generic function with 1 method)"
]
}
],
"prompt_number": 88
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function test_mojibake(v,N)\n",
" isit = false\n",
" for i in 1:N\n",
" isit = mojibake_islower(v) \n",
" end\n",
" return isit\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 89,
"text": [
"test_mojibake (generic function with 1 method)"
]
}
],
"prompt_number": 89
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@time test_base(c,CN)\n",
"@time test_base(c,CN)\n",
"@time test_base(c,CN)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"elapsed time: 0."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"002158956 seconds (80 bytes allocated)\n",
"elapsed time: 0.002148192 seconds (80 bytes allocated)\n",
"elapsed time: 0.00215309 seconds (80 bytes allocated)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 92,
"text": [
"true"
]
}
],
"prompt_number": 92
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@time test_utf8proc(c,CN) \n",
"@time test_utf8proc(c,CN) \n",
"@time test_utf8proc(c,CN) "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"elapsed time: 0."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"004567398 seconds (80 bytes allocated)\n",
"elapsed time: 0.004400526 seconds (80 bytes allocated)\n",
"elapsed time: 0.004369747 seconds (80 bytes allocated)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 93,
"text": [
"true"
]
}
],
"prompt_number": 93
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@time test_base(s8,N)\n",
"@time test_base(s8,N)\n",
"@time test_base(s8b,N)\n",
"@time test_base(s8b,N)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"elapsed time: 0."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"00122146 seconds (80 bytes allocated)\n",
"elapsed time: 0.001210436 seconds (80 bytes allocated)\n",
"elapsed time: 0.0867118 seconds (80 bytes allocated)\n",
"elapsed time: "
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"0.08556961 seconds (80 bytes allocated)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 71,
"text": [
"true"
]
}
],
"prompt_number": 71
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@time test_utf8proc(s8,N)\n",
"@time test_utf8proc(s8,N)\n",
"@time test_utf8proc(s8b,N) \n",
"@time test_utf8proc(s8b,N) "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"elapsed time: 0."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"000205483 seconds (80 bytes allocated)\n",
"elapsed time: 0.000203575 seconds (80 bytes allocated)\n",
"elapsed time: 0.019182563 seconds (80 bytes allocated)\n",
"elapsed time: 0.018921496 seconds (80 bytes allocated)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 73,
"text": [
"true"
]
}
],
"prompt_number": 73
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@time test_mojibake(s8,N)\n",
"@time test_mojibake(s8,N)\n",
"@time test_mojibake(s8b,N)\n",
"@time test_mojibake(s8b,N)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"elapsed time: 0."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"000109839 seconds (80 bytes allocated)\n",
"elapsed time: 5.958e-5 seconds (80 bytes allocated)\n",
"elapsed time: 0.006180564 seconds (80 bytes allocated)\n",
"elapsed time: 0.006231801 seconds (80 bytes allocated)\n"
]
},
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 95,
"text": [
"true"
]
}
],
"prompt_number": 95
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## islower(Char) over a shuffled Char array\n",
"* Base.islower(Char) appears to be about 1.5x faster than islower_utf8proc() version"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"CARUNS=100\n",
"function test_chararray_base(ca)\n",
" isit = false\n",
" for c in ca\n",
" isit = islower_base(c)\n",
" end\n",
" return isit\n",
"end\n",
"tmupc = Float64[]\n",
"i=0\n",
"while i<CARUNS\n",
" v, t, b, g = @timed test_chararray_base(shuffle(ca))\n",
" if g==0\n",
" push!(tmupc,t)\n",
" i+=1\n",
" end\n",
" sleep(0.02) \n",
"end\n",
"println(\"\\nchar array base islower(Char)\")\n",
"println(\" Mean time per islower(c): $(mean(tmupc))\")\n",
"println(\" Std Dev time per islower(c): $(std(tmupc))\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"char array base islower(Char)\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per islower(c): 0.00010246523\n",
" Std Dev time per islower(c): 1.4454150120088016e-5"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 96
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function test_chararray_utf8proc(ca)\n",
" isit = false\n",
" for c in ca\n",
" isit = islower_utf8proc(c)\n",
" end\n",
" return isit\n",
"end\n",
"tmupc = Float64[]\n",
"i=0\n",
"while i<CARUNS\n",
" v, t, b, g = @timed test_chararray_utf8proc(shuffle(ca))\n",
" if g==0\n",
" push!(tmupc,t)\n",
" i+=1\n",
" end\n",
" sleep(0.02) \n",
"end\n",
"println(\"\\nchar array utf8proc islower(Char)\")\n",
"println(\" Mean time per islower(c): $(mean(tmupc))\")\n",
"println(\" Std Dev time per islower(c): $(std(tmupc))\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"char array utf8proc islower(Char)\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per islower(c): 0.00011192105\n",
" Std Dev time per islower(c): 1.0925861625986393e-5\n"
]
}
],
"prompt_number": 97
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# PCRE islower(Char) is a non-starter if we have to do string(Char) due to conversion and memory allocation"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 62
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## islower(String) expanded test -- multiple runs and string types"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function stringtest_base(s, runs, N)\n",
" tm = Float64[]\n",
" println(typeof(s), \" \", length(s))\n",
" i=0\n",
" while i<runs+2\n",
" v, t, b, g = @timed test_base(s,N)\n",
" if g==0\n",
" if i>2 #ignore first 2 warm-up runs\n",
" push!(tm,t)\n",
" end\n",
" i+=1\n",
" end\n",
" sleep(0.01) \n",
" end\n",
" mean(tm), std(tm)\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 98,
"text": [
"stringtest_base (generic function with 1 method)"
]
}
],
"prompt_number": 98
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function stringtest_utf8proc(s, runs, N)\n",
" tm = Float64[]\n",
" println(typeof(s), \" \", length(s))\n",
" i=0\n",
" while i<runs+2\n",
" v, t, b, g = @timed test_utf8proc(s,N)\n",
" if g==0\n",
" if i>2 #ignore first 2 warm-up runs\n",
" push!(tm,t)\n",
" end\n",
" i+=1\n",
" end\n",
" sleep(0.01) \n",
" end\n",
" mean(tm), std(tm)\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 99,
"text": [
"stringtest_utf8proc (generic function with 1 method)"
]
}
],
"prompt_number": 99
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function stringtest_mojibake(s, runs, N)\n",
" tm = Float64[]\n",
" println(typeof(s), \" \", length(s))\n",
" i=0\n",
" while i<runs+2\n",
" v, t, b, g = @timed test_mojibake(s,N)\n",
" if g==0\n",
" if i>2 #ignore first 2 warm-up runs\n",
" push!(tm,t)\n",
" end\n",
" i+=1\n",
" end\n",
" sleep(0.01) \n",
" end\n",
" mean(tm), std(tm)\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 100,
"text": [
"stringtest_mojibake (generic function with 1 method)"
]
}
],
"prompt_number": 100
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Short string by type"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"println(\"shorter islower_base(String)\")\n",
"for s in ts\n",
" avgt, sdt = stringtest_base(s, runs, N)\n",
" println(\" Mean time per $N islower(s): $avgt\")\n",
" println(\" Std Dev time per $N islower(s): $sdt\")\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"shorter islower_base(String)\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"UTF8String 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0008859104736842106\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 3.0579873180044464e-5\n",
"SubString{UTF8String} 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0008886625263157894\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 2.909439493817862e-5\n",
"UTF32String 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0007661934736842105\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 2.5675350276456092e-5\n",
"RevString{UTF8String} 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0010755214210526314\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 4.265213206253038e-5\n",
"RepString 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.013362767052631577\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.00013291619916384821\n"
]
}
],
"prompt_number": 101
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"println(\"shorter islower_utf8proc(String)\")\n",
"for s in ts\n",
" avgt, sdt = stringtest_utf8proc(s, runs, N)\n",
" println(\" Mean time per $N islower(s): $avgt\")\n",
" println(\" Std Dev time per $N islower(s): $sdt\")\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"shorter islower_utf8proc(String)\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"UTF8String 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0002451072631578948\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 3.7097391784392046e-5\n",
"SubString{UTF8String} 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.00025295284210526315\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 2.297940062863829e-5\n",
"UTF32String 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0001367904736842105\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 2.700984922696086e-5\n",
"RevString{UTF8String} 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.0004213798947368421\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 2.524794539912491e-5\n",
"RepString 14\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.012843512789473686\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.0001222177376395545\n"
]
}
],
"prompt_number": 102
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### mojibake_islower() is set up only for ByteStrings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Longer string by type (excluding the slooow RepString)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"LN = 1000\n",
"println(\"longer islower_base(String)\")\n",
"for s in tsb[1:end-1]\n",
" avgt, sdt = stringtest_base(s, runs, LN)\n",
" println(\" Mean time per $N islower(s): $avgt\")\n",
" println(\" Std Dev time per $N islower(s): $sdt\")\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"longer islower_base(String)\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"UTF8String 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.08001143336842105\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.00048135938686913004\n",
"SubString{UTF8String} 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.08094776884210526\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.0004420752573338697\n",
"UTF32String 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.06830800131578947\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.00041131204154700216\n",
"RevString{UTF8String} 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.09844635994736843\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.00028812059821954085\n"
]
}
],
"prompt_number": 105
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"println(\"longer islower_utf8proc(String)\")\n",
"for s in tsb[1:end-1]\n",
" avgt, sdt = stringtest_utf8proc(s, runs, LN)\n",
" println(\" Mean time per $N islower(s): $avgt\")\n",
" println(\" Std Dev time per $N islower(s): $sdt\")\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"longer islower_utf8proc(String)\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"UTF8String 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.018954816578947365\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 3.470253665290481e-5\n",
"SubString{UTF8String} 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.01851312689473684\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.00010500337487375001\n",
"UTF32String 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.007234282210526316\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 2.290304486447872e-5\n",
"RevString{UTF8String} 1400\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Mean time per 1000 islower(s): 0.036085492368421054\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
" Std Dev time per 1000 islower(s): 0.000493830033976403\n"
]
}
],
"prompt_number": 106
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment