Created
April 14, 2020 23:56
-
-
Save iizukak/59bfe59f2cdbbfbc028f92c38b7166de to your computer and use it in GitHub Desktop.
TensorFlow Lite quantization benchmark
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
STARTING! | |
Duplicate flags: num_threads | |
Min num runs: [50] | |
Min runs duration (seconds): [1] | |
Max runs duration (seconds): [150] | |
Inter-run delay (seconds): [-1] | |
Num threads: [1] | |
Benchmark name: [] | |
Output prefix: [] | |
Min warmup runs: [1] | |
Min warmup runs duration (seconds): [0.5] | |
Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite] | |
Input layers: [] | |
Input shapes: [] | |
Input value ranges: [] | |
Input layer values files: [] | |
Allow fp16 : [0] | |
Require full delegation : [0] | |
Enable op profiling: [1] | |
Max profiling buffer entries: [1024] | |
CSV File to export profiling data to: [] | |
Enable platform-wide tracing: [0] | |
#threads used for CPU inference: [1] | |
Max number of delegated partitions : [0] | |
External delegate path : [] | |
External delegate options : [] | |
Use gpu : [0] | |
Use xnnpack : [0] | |
Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite | |
The input model file size (MB): 0.477872 | |
Initialized session in 0.335ms. | |
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. | |
count=352 first=1625 curr=1359 min=1334 max=1842 avg=1419.58 std=71 | |
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds. | |
count=634 first=1430 curr=3071 min=1360 max=3317 avg=1554.21 std=374 | |
Inference timings in us: Init: 335, First inference: 1625, Warmup (avg): 1419.58, Inference (avg): 1554.21 | |
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. | |
Peak memory footprint (MB): init=0.5625 overall=2.61719 | |
Profiling Info for Benchmark Initialization: | |
============================== Run Order ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
AllocateTensors 0.000 0.031 0.031 100.000% 100.000% 0.000 1 AllocateTensors/0 | |
============================== Top by Computation Time ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
AllocateTensors 0.000 0.031 0.031 100.000% 100.000% 0.000 1 AllocateTensors/0 | |
Number of nodes executed: 1 | |
============================== Summary by node type ============================== | |
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called] | |
AllocateTensors 1 0.031 100.000% 100.000% 0.000 1 | |
Timings (microseconds): count=1 curr=31 | |
Memory (bytes): count=0 | |
1 nodes observed | |
Operator-wise Profiling Info for Regular Benchmark Runs: | |
============================== Run Order ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
DEPTHWISE_CONV_2D 0.000 0.064 0.071 4.541% 4.541% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:0 | |
DEPTHWISE_CONV_2D 0.071 0.367 0.412 26.558% 31.099% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:1 | |
CONV_2D 0.483 0.131 0.149 9.576% 40.675% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:2 | |
DEPTHWISE_CONV_2D 0.632 0.174 0.194 12.461% 53.136% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:3 | |
DEPTHWISE_CONV_2D 0.825 0.163 0.183 11.753% 64.888% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:4 | |
CONV_2D 1.008 0.118 0.135 8.665% 73.553% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:5 | |
DEPTHWISE_CONV_2D 1.143 0.080 0.089 5.744% 79.297% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:6 | |
DEPTHWISE_CONV_2D 1.232 0.114 0.079 5.100% 84.398% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:7 | |
CONV_2D 1.311 0.132 0.143 9.203% 93.600% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:8 | |
DEPTHWISE_CONV_2D 1.454 0.039 0.040 2.567% 96.167% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:9 | |
MEAN 1.494 0.034 0.034 2.174% 98.342% 0.000 1 [fer_small/global_average_pooling2d/Mean]:10 | |
FULLY_CONNECTED 1.528 0.013 0.024 1.539% 99.881% 0.000 1 [fer_small/dense/Relu]:11 | |
FULLY_CONNECTED 1.552 0.001 0.001 0.076% 99.957% 0.000 1 [fer_small/dense_1/BiasAdd]:12 | |
SOFTMAX 1.553 0.000 0.001 0.043% 100.000% 0.000 1 [Identity]:13 | |
============================== Top by Computation Time ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
DEPTHWISE_CONV_2D 0.071 0.367 0.412 26.558% 26.558% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:1 | |
DEPTHWISE_CONV_2D 0.632 0.174 0.194 12.461% 39.018% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:3 | |
DEPTHWISE_CONV_2D 0.825 0.163 0.183 11.753% 50.771% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:4 | |
CONV_2D 0.483 0.131 0.149 9.576% 60.347% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:2 | |
CONV_2D 1.311 0.132 0.143 9.203% 69.549% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:8 | |
CONV_2D 1.008 0.118 0.135 8.665% 78.214% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:5 | |
DEPTHWISE_CONV_2D 1.143 0.080 0.089 5.744% 83.958% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:6 | |
DEPTHWISE_CONV_2D 1.232 0.114 0.079 5.100% 89.059% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:7 | |
DEPTHWISE_CONV_2D 0.000 0.064 0.071 4.541% 93.600% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:0 | |
DEPTHWISE_CONV_2D 1.454 0.039 0.040 2.567% 96.167% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:9 | |
Number of nodes executed: 14 | |
============================== Summary by node type ============================== | |
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called] | |
DEPTHWISE_CONV_2D 7 1.064 68.867% 68.867% 0.000 7 | |
CONV_2D 3 0.424 27.443% 96.311% 0.000 3 | |
MEAN 1 0.033 2.136% 98.447% 0.000 1 | |
FULLY_CONNECTED 2 0.024 1.553% 100.000% 0.000 2 | |
SOFTMAX 1 0.000 0.000% 100.000% 0.000 1 | |
Timings (microseconds): count=634 first=1430 curr=3069 min=1357 max=3315 avg=1552.91 std=374 | |
Memory (bytes): count=0 | |
14 nodes observed | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
STARTING! | |
Duplicate flags: num_threads | |
Min num runs: [50] | |
Min runs duration (seconds): [1] | |
Max runs duration (seconds): [150] | |
Inter-run delay (seconds): [-1] | |
Num threads: [1] | |
Benchmark name: [] | |
Output prefix: [] | |
Min warmup runs: [1] | |
Min warmup runs duration (seconds): [0.5] | |
Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small_quant.tflite] | |
Input layers: [] | |
Input shapes: [] | |
Input value ranges: [] | |
Input layer values files: [] | |
Allow fp16 : [0] | |
Require full delegation : [0] | |
Enable op profiling: [1] | |
Max profiling buffer entries: [1024] | |
CSV File to export profiling data to: [] | |
Enable platform-wide tracing: [0] | |
#threads used for CPU inference: [1] | |
Max number of delegated partitions : [0] | |
External delegate path : [] | |
External delegate options : [] | |
Use gpu : [0] | |
Use xnnpack : [0] | |
Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small_quant.tflite | |
The input model file size (MB): 0.157232 | |
Initialized session in 1.419ms. | |
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds. | |
count=30 first=18720 curr=17505 min=13567 max=23425 avg=16820.9 std=1953 | |
Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds. | |
count=63 first=17493 curr=15778 min=13120 max=18710 avg=15881.9 std=1497 | |
Inference timings in us: Init: 1419, First inference: 18720, Warmup (avg): 16820.9, Inference (avg): 15881.9 | |
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion. | |
Peak memory footprint (MB): init=2.24219 overall=2.88672 | |
Profiling Info for Benchmark Initialization: | |
============================== Run Order ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
AllocateTensors 0.000 0.086 0.086 100.000% 100.000% 1804.000 1 AllocateTensors/0 | |
============================== Top by Computation Time ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
AllocateTensors 0.000 0.086 0.086 100.000% 100.000% 1804.000 1 AllocateTensors/0 | |
Number of nodes executed: 1 | |
============================== Summary by node type ============================== | |
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called] | |
AllocateTensors 1 0.086 100.000% 100.000% 1804.000 1 | |
Timings (microseconds): count=1 curr=86 | |
Memory (bytes): count=0 | |
1 nodes observed | |
Operator-wise Profiling Info for Regular Benchmark Runs: | |
============================== Run Order ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
QUANTIZE 0.000 0.013 0.015 0.092% 0.092% 0.000 1 [img_int8]:0 | |
DEPTHWISE_CONV_2D 0.015 0.185 0.218 1.374% 1.466% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:1 | |
DEPTHWISE_CONV_2D 0.233 0.565 0.634 3.996% 5.462% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:2 | |
CONV_2D 0.868 5.605 4.818 30.354% 35.815% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:3 | |
DEPTHWISE_CONV_2D 5.686 0.287 0.320 2.019% 37.834% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:4 | |
DEPTHWISE_CONV_2D 6.007 0.248 0.282 1.779% 39.613% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:5 | |
CONV_2D 6.290 4.970 4.386 27.629% 67.243% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:6 | |
DEPTHWISE_CONV_2D 10.677 0.126 0.146 0.922% 68.165% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:7 | |
DEPTHWISE_CONV_2D 10.823 0.112 0.127 0.802% 68.966% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:8 | |
CONV_2D 10.951 4.821 4.339 27.332% 96.299% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:9 | |
DEPTHWISE_CONV_2D 15.291 0.067 0.081 0.512% 96.811% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:10 | |
MEAN 15.373 0.051 0.044 0.276% 97.087% 0.000 1 [fer_small/global_average_pooling2d/Mean]:11 | |
FULLY_CONNECTED 15.417 0.422 0.444 2.800% 99.887% 0.000 1 [fer_small/dense/Relu]:12 | |
FULLY_CONNECTED 15.862 0.012 0.015 0.092% 99.980% 0.000 1 [fer_small/dense_1/BiasAdd]:13 | |
SOFTMAX 15.877 0.002 0.002 0.011% 99.991% 0.000 1 [Identity_int8]:14 | |
QUANTIZE 15.879 0.001 0.001 0.009% 100.000% 0.000 1 [Identity]:15 | |
============================== Top by Computation Time ============================== | |
[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name] | |
CONV_2D 0.868 5.605 4.818 30.354% 30.354% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:3 | |
CONV_2D 6.290 4.970 4.386 27.629% 57.983% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:6 | |
CONV_2D 10.951 4.821 4.339 27.332% 85.315% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:9 | |
DEPTHWISE_CONV_2D 0.233 0.565 0.634 3.996% 89.311% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:2 | |
FULLY_CONNECTED 15.417 0.422 0.444 2.800% 92.111% 0.000 1 [fer_small/dense/Relu]:12 | |
DEPTHWISE_CONV_2D 5.686 0.287 0.320 2.019% 94.130% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:4 | |
DEPTHWISE_CONV_2D 6.007 0.248 0.282 1.779% 95.909% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:5 | |
DEPTHWISE_CONV_2D 0.015 0.185 0.218 1.374% 97.283% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:1 | |
DEPTHWISE_CONV_2D 10.677 0.126 0.146 0.922% 98.205% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:7 | |
DEPTHWISE_CONV_2D 10.823 0.112 0.127 0.802% 99.007% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:8 | |
Number of nodes executed: 16 | |
============================== Summary by node type ============================== | |
[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called] | |
CONV_2D 3 13.541 85.346% 85.346% 0.000 3 | |
DEPTHWISE_CONV_2D 7 1.808 11.395% 96.741% 0.000 7 | |
FULLY_CONNECTED 2 0.458 2.887% 99.628% 0.000 2 | |
MEAN 1 0.043 0.271% 99.899% 0.000 1 | |
QUANTIZE 2 0.015 0.095% 99.994% 0.000 2 | |
SOFTMAX 1 0.001 0.006% 100.000% 0.000 1 | |
Timings (microseconds): count=63 first=17487 curr=15774 min=13116 max=18691 avg=15873.2 std=1494 | |
Memory (bytes): count=0 | |
16 nodes observed | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment