Skip to content

Instantly share code, notes, and snippets.

@mattip
Last active November 7, 2018 21:33
Show Gist options
  • Save mattip/bae62b026506b0bc220be2535964c9cc to your computer and use it in GitHub Desktop.
Save mattip/bae62b026506b0bc220be2535964c9cc to your computer and use it in GitHub Desktop.
benchmarks for matmul commit 207fd8b3 vs 97df9287 (HEAD vs. master)
before after ratio
[97df9287] [207fd8b3]
+ 10.6±0.5ms 25.0±10ms 2.36 bench_linalg.Eindot.time_matmul_trans_a_at
+ 47.2±0.6ms 58.0±4ms 1.23 bench_app.MaxesOfDots.time_it
+ 1.89±0ms 2.08±0ms 1.10 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <type 'bool'>)
+ 2.81±0ms 3.09±0.01ms 1.10 bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <type 'bool'>)
+ 142±20ms 156±5ms 1.10 bench_linalg.Linalg.time_op('pinv', 'complex128')
+ 904±20ns 988±20ns 1.09 bench_indexing.IndexingStructured0D.time_array_slice
+ 21.2±0.07μs 22.5±0.2μs 1.06 bench_io.Copy.time_cont_assign('complex128')
+ 622±6ns 658±10ns 1.06 bench_indexing.IndexingStructured0D.time_array_all
+ 41.4±0.5μs 43.7±0.1μs 1.06 bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <type 'bool'>)
+ 50.8±0.2μs 53.5±0.3μs 1.05 bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <type 'bool'>)
- 4.67±0.08μs 4.45±0.02μs 0.95 bench_ma.Indexing.time_scalar(False, 1, 10)
- 23.4±0.07μs 22.2±0.1μs 0.95 bench_reduce.MinMax.time_max(<type 'numpy.float64'>)
- 3.59±0.01μs 3.41±0.01μs 0.95 bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), 0, None))
- 2.69±0.02μs 2.56±0.02μs 0.95 bench_core.PackBits.time_packbits(<type 'bool'>)
- 3.71±0.03μs 3.52±0.04μs 0.95 bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.]), axis=0))
- 10.2±0.05μs 9.64±0.05μs 0.95 bench_reduce.SmallReduction.time_small
- 94.8±0.2μs 89.9±0.09μs 0.95 bench_lib.Pad.time_pad((10, 100), (0, 5), 'edge')
- 49.1±0.1μs 46.5±0.07μs 0.95 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <type 'int'>)
- 2.00±0.03μs 1.90±0.02μs 0.95 bench_io.Copy.time_cont_assign('int8')
- 17.2±0.3μs 16.2±0.1μs 0.95 bench_reduce.MinMax.time_max(<type 'numpy.float32'>)
- 17.0±0.2μs 16.0±0.09μs 0.95 bench_reduce.MinMax.time_min(<type 'numpy.float32'>)
- 3.43±0.01μs 3.24±0.02μs 0.94 bench_ufunc.ArgParsingReduce.time_add_reduce_arg_parsing((array([0., 1.])))
- 2.54±0.01μs 2.40±0.02μs 0.94 bench_ufunc.Custom.time_or_bool
- 91.3±0.09μs 86.0±0.4μs 0.94 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <type 'int'>)
- 20.3±0.4μs 19.1±0.1μs 0.94 bench_core.CountNonzero.time_count_nonzero_axis(1, 100, <type 'int'>)
- 24.7±0.2μs 23.1±0.06μs 0.94 bench_core.CountNonzero.time_count_nonzero_axis(2, 100, <type 'str'>)
- 33.3±0.5μs 31.0±0.2μs 0.93 bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <type 'bool'>)
- 44.9±0.1μs 41.5±0.04μs 0.92 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <type 'bool'>)
- 34.4±0.3μs 31.7±0.2μs 0.92 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <type 'bool'>)
- 55.0±0.4μs 50.6±0.08μs 0.92 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <type 'bool'>)
- 1.05±0ms 951±0.5μs 0.91 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 1000000, <type 'bool'>)
- 1.04±0ms 949±0.4μs 0.91 bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <type 'bool'>)
- 2.07±0ms 1.88±0ms 0.91 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <type 'bool'>)
- 3.08±0ms 2.80±0ms 0.91 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <type 'bool'>)
- 151±0.06μs 136±0.05μs 0.90 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <type 'str'>)
- 150±0.6μs 135±0.06μs 0.90 bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <type 'str'>)
- 1.46±0.01μs 1.29±0.05μs 0.89 bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), array(3.)))
- 1.52±0.02μs 1.35±0.03μs 0.89 bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), array(3.), subok=True, where=True))
- 271±0.1μs 240±0.08μs 0.89 bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <type 'str'>)
- 34.5±0.2ms 30.5±0.06ms 0.89 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <type 'str'>)
- 268±0.08μs 238±0.4μs 0.89 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <type 'str'>)
- 34.8±0.2ms 30.8±0.07ms 0.89 bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <type 'str'>)
- 23.3±0.1ms 20.6±0.1ms 0.88 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <type 'str'>)
- 1.47±0.01μs 1.30±0.01μs 0.88 bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), out=array(3.)))
- 23.1±0.1ms 20.4±0.1ms 0.88 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <type 'str'>)
- 386±0.2μs 341±0.1μs 0.88 bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <type 'str'>)
- 390±0.2μs 345±0.7μs 0.88 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <type 'str'>)
- 1.41±0.02μs 1.25±0.02μs 0.88 bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), out=(array(3.),)))
- 12.2±0.2ms 10.7±0.4ms 0.88 bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <type 'str'>)
- 1.51±0.01μs 1.32±0.02μs 0.87 bench_ufunc.ArgParsing.time_add_arg_parsing((array(1.), array(2.), subok=True, where=True, out=array(3.)))
- 53.2±1ms 42.5±1ms 0.80 bench_linalg.Eindot.time_dot_a_b
before after ratio
[97df9287] [cafcfa4d]
+ 6.06±0.9ms 9.51±1ms 1.57 bench_linalg.Eindot.time_matmul_trans_a_at
+ 1.31±0ms 1.67±0ms 1.27 bench_reduce.AddReduceSeparate.time_reduce(0, 'int16')
+ 74.4±10ms 91.9±2ms 1.24 bench_linalg.Linalg.time_op('pinv', 'int64')
+ 22.7±0.8ms 27.9±2ms 1.23 bench_linalg.Eindot.time_dot_trans_a_atc
+ 591±5ms 680±7ms 1.15 bench_function_base.Histogram2D.time_fine_binning
+ 1.74±0.03μs 1.93±0.02μs 1.11 bench_indexing.IndexingStructured0D.time_scalar_all
+ 1.88±0ms 2.07±0ms 1.10 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <type 'bool'>)
+ 2.81±0ms 3.09±0ms 1.10 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <type 'bool'>)
+ 950±1μs 1.05±0ms 1.10 bench_core.CountNonzero.time_count_nonzero_axis(1, 1000000, <type 'bool'>)
+ 952±0.5μs 1.05±0ms 1.10 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 1000000, <type 'bool'>)
- 3.10±0ms 2.81±0ms 0.91 bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <type 'bool'>)
- 2.08±0ms 1.88±0ms 0.91 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <type 'bool'>)
- 268±0.07μs 241±0.9μs 0.90 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 10000, <type 'str'>)
- 386±0.2μs 345±0.1μs 0.89 bench_core.CountNonzero.time_count_nonzero_axis(3, 10000, <type 'str'>)
- 34.6±0.1ms 30.9±0.08ms 0.89 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 1000000, <type 'str'>)
- 23.2±0.1ms 20.6±0.1ms 0.89 bench_core.CountNonzero.time_count_nonzero_multi_axis(2, 1000000, <type 'str'>)
- 151±0.06μs 134±0.7μs 0.89 bench_core.CountNonzero.time_count_nonzero_axis(1, 10000, <type 'str'>)
- 152±0.4μs 134±0.4μs 0.88 bench_core.CountNonzero.time_count_nonzero_multi_axis(1, 10000, <type 'str'>)
- 271±0.3μs 238±0.1μs 0.88 bench_core.CountNonzero.time_count_nonzero_axis(2, 10000, <type 'str'>)
- 389±0.3μs 342±0.08μs 0.88 bench_core.CountNonzero.time_count_nonzero_multi_axis(3, 10000, <type 'str'>)
- 34.9±0.1ms 30.6±0.1ms 0.88 bench_core.CountNonzero.time_count_nonzero_axis(3, 1000000, <type 'str'>)
- 23.3±0.08ms 20.4±0.2ms 0.88 bench_core.CountNonzero.time_count_nonzero_axis(2, 1000000, <type 'str'>)
- 952±30ns 818±4ns 0.86 bench_core.Core.time_array_empty
- 1.19±0.01μs 1.01±0.01μs 0.85 bench_core.Core.time_array_l1
- 1.03±0.01μs 867±10ns 0.84 bench_core.Core.time_arange_100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment