iizukak · April 14, 2020 23:56
diff --git a/mobilenet_small_bench.txt b/mobilenet_small_bench.txt
 STARTING!
 Duplicate flags: num_threads
 Min num runs: [50]
 Min runs duration (seconds): [1]
 Max runs duration (seconds): [150]
 Inter-run delay (seconds): [-1]
 Num threads: [1]
 Benchmark name: []
 Output prefix: []
 Min warmup runs: [1]
 Min warmup runs duration (seconds): [0.5]
 Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite]
 Input layers: []
 Input shapes: []
 Input value ranges: []
 Input layer values files: []
 Allow fp16 : [0]
 Require full delegation : [0]
 Enable op profiling: [1]
 Max profiling buffer entries: [1024]
 CSV File to export profiling data to: []
 Enable platform-wide tracing: [0]
 #threads used for CPU inference: [1]
 Max number of delegated partitions : [0]
 External delegate path : []
 External delegate options : []
 Use gpu : [0]
 Use xnnpack : [0]
 Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite
 The input model file size (MB): 0.477872
 Initialized session in 0.335ms.
 Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
 count=352 first=1625 curr=1359 min=1334 max=1842 avg=1419.58 std=71

 Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
 count=634 first=1430 curr=3071 min=1360 max=3317 avg=1554.21 std=374

 Inference timings in us: Init: 335, First inference: 1625, Warmup (avg): 1419.58, Inference (avg): 1554.21
 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
 Peak memory footprint (MB): init=0.5625 overall=2.61719
 Profiling Info for Benchmark Initialization:
 ============================== Run Order ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	         AllocateTensors	            0.000	    0.031	    0.031	100.000%	100.000%	     0.000	        1	AllocateTensors/0

 ============================== Top by Computation Time ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	         AllocateTensors	            0.000	    0.031	    0.031	100.000%	100.000%	     0.000	        1	AllocateTensors/0

 Number of nodes executed: 1
 ============================== Summary by node type ==============================
 	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
 	         AllocateTensors	        1	     0.031	   100.000%	   100.000%	     0.000	        1

 Timings (microseconds): count=1 curr=31
 Memory (bytes): count=0
 1 nodes observed



 Operator-wise Profiling Info for Regular Benchmark Runs:
 ============================== Run Order ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	       DEPTHWISE_CONV_2D	            0.000	    0.064	    0.071	  4.541%	  4.541%	     0.000	        1	[fer_small/tf_op_layer_Relu/Relu]:0
 	       DEPTHWISE_CONV_2D	            0.071	    0.367	    0.412	 26.558%	 31.099%	     0.000	        1	[fer_small/tf_op_layer_Relu_1/Relu_1]:1
 	                 CONV_2D	            0.483	    0.131	    0.149	  9.576%	 40.675%	     0.000	        1	[fer_small/tf_op_layer_Relu_2/Relu_2]:2
 	       DEPTHWISE_CONV_2D	            0.632	    0.174	    0.194	 12.461%	 53.136%	     0.000	        1	[fer_small/tf_op_layer_Relu_3/Relu_3]:3
 	       DEPTHWISE_CONV_2D	            0.825	    0.163	    0.183	 11.753%	 64.888%	     0.000	        1	[fer_small/tf_op_layer_Relu_4/Relu_4]:4
 	                 CONV_2D	            1.008	    0.118	    0.135	  8.665%	 73.553%	     0.000	        1	[fer_small/tf_op_layer_Relu_5/Relu_5]:5
 	       DEPTHWISE_CONV_2D	            1.143	    0.080	    0.089	  5.744%	 79.297%	     0.000	        1	[fer_small/tf_op_layer_Relu_6/Relu_6]:6
 	       DEPTHWISE_CONV_2D	            1.232	    0.114	    0.079	  5.100%	 84.398%	     0.000	        1	[fer_small/tf_op_layer_Relu_7/Relu_7]:7
 	                 CONV_2D	            1.311	    0.132	    0.143	  9.203%	 93.600%	     0.000	        1	[fer_small/tf_op_layer_Relu_8/Relu_8]:8
 	       DEPTHWISE_CONV_2D	            1.454	    0.039	    0.040	  2.567%	 96.167%	     0.000	        1	[fer_small/tf_op_layer_Relu_9/Relu_9]:9
 	                    MEAN	            1.494	    0.034	    0.034	  2.174%	 98.342%	     0.000	        1	[fer_small/global_average_pooling2d/Mean]:10
 	         FULLY_CONNECTED	            1.528	    0.013	    0.024	  1.539%	 99.881%	     0.000	        1	[fer_small/dense/Relu]:11
 	         FULLY_CONNECTED	            1.552	    0.001	    0.001	  0.076%	 99.957%	     0.000	        1	[fer_small/dense_1/BiasAdd]:12
 	                 SOFTMAX	            1.553	    0.000	    0.001	  0.043%	100.000%	     0.000	        1	[Identity]:13

 ============================== Top by Computation Time ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	       DEPTHWISE_CONV_2D	            0.071	    0.367	    0.412	 26.558%	 26.558%	     0.000	        1	[fer_small/tf_op_layer_Relu_1/Relu_1]:1
 	       DEPTHWISE_CONV_2D	            0.632	    0.174	    0.194	 12.461%	 39.018%	     0.000	        1	[fer_small/tf_op_layer_Relu_3/Relu_3]:3
 	       DEPTHWISE_CONV_2D	            0.825	    0.163	    0.183	 11.753%	 50.771%	     0.000	        1	[fer_small/tf_op_layer_Relu_4/Relu_4]:4
 	                 CONV_2D	            0.483	    0.131	    0.149	  9.576%	 60.347%	     0.000	        1	[fer_small/tf_op_layer_Relu_2/Relu_2]:2
 	                 CONV_2D	            1.311	    0.132	    0.143	  9.203%	 69.549%	     0.000	        1	[fer_small/tf_op_layer_Relu_8/Relu_8]:8
 	                 CONV_2D	            1.008	    0.118	    0.135	  8.665%	 78.214%	     0.000	        1	[fer_small/tf_op_layer_Relu_5/Relu_5]:5
 	       DEPTHWISE_CONV_2D	            1.143	    0.080	    0.089	  5.744%	 83.958%	     0.000	        1	[fer_small/tf_op_layer_Relu_6/Relu_6]:6
 	       DEPTHWISE_CONV_2D	            1.232	    0.114	    0.079	  5.100%	 89.059%	     0.000	        1	[fer_small/tf_op_layer_Relu_7/Relu_7]:7
 	       DEPTHWISE_CONV_2D	            0.000	    0.064	    0.071	  4.541%	 93.600%	     0.000	        1	[fer_small/tf_op_layer_Relu/Relu]:0
 	       DEPTHWISE_CONV_2D	            1.454	    0.039	    0.040	  2.567%	 96.167%	     0.000	        1	[fer_small/tf_op_layer_Relu_9/Relu_9]:9

 Number of nodes executed: 14
 ============================== Summary by node type ==============================
 	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
 	       DEPTHWISE_CONV_2D	        7	     1.064	    68.867%	    68.867%	     0.000	        7
 	                 CONV_2D	        3	     0.424	    27.443%	    96.311%	     0.000	        3
 	                    MEAN	        1	     0.033	     2.136%	    98.447%	     0.000	        1
 	         FULLY_CONNECTED	        2	     0.024	     1.553%	   100.000%	     0.000	        2
 	                 SOFTMAX	        1	     0.000	     0.000%	   100.000%	     0.000	        1

 Timings (microseconds): count=634 first=1430 curr=3069 min=1357 max=3315 avg=1552.91 std=374
 Memory (bytes): count=0
 14 nodes observed



diff --git a/mobilenet_small_quant_bench.txt b/mobilenet_small_quant_bench.txt
 STARTING!
 Duplicate flags: num_threads
 Min num runs: [50]
 Min runs duration (seconds): [1]
 Max runs duration (seconds): [150]
 Inter-run delay (seconds): [-1]
 Num threads: [1]
 Benchmark name: []
 Output prefix: []
 Min warmup runs: [1]
 Min warmup runs duration (seconds): [0.5]
 Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small_quant.tflite]
 Input layers: []
 Input shapes: []
 Input value ranges: []
 Input layer values files: []
 Allow fp16 : [0]
 Require full delegation : [0]
 Enable op profiling: [1]
 Max profiling buffer entries: [1024]
 CSV File to export profiling data to: []
 Enable platform-wide tracing: [0]
 #threads used for CPU inference: [1]
 Max number of delegated partitions : [0]
 External delegate path : []
 External delegate options : []
 Use gpu : [0]
 Use xnnpack : [0]
 Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small_quant.tflite
 The input model file size (MB): 0.157232
 Initialized session in 1.419ms.
 Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
 count=30 first=18720 curr=17505 min=13567 max=23425 avg=16820.9 std=1953

 Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
 count=63 first=17493 curr=15778 min=13120 max=18710 avg=15881.9 std=1497

 Inference timings in us: Init: 1419, First inference: 18720, Warmup (avg): 16820.9, Inference (avg): 15881.9
 Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
 Peak memory footprint (MB): init=2.24219 overall=2.88672
 Profiling Info for Benchmark Initialization:
 ============================== Run Order ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	         AllocateTensors	            0.000	    0.086	    0.086	100.000%	100.000%	  1804.000	        1	AllocateTensors/0

 ============================== Top by Computation Time ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	         AllocateTensors	            0.000	    0.086	    0.086	100.000%	100.000%	  1804.000	        1	AllocateTensors/0

 Number of nodes executed: 1
 ============================== Summary by node type ==============================
 	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
 	         AllocateTensors	        1	     0.086	   100.000%	   100.000%	  1804.000	        1

 Timings (microseconds): count=1 curr=86
 Memory (bytes): count=0
 1 nodes observed



 Operator-wise Profiling Info for Regular Benchmark Runs:
 ============================== Run Order ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	                QUANTIZE	            0.000	    0.013	    0.015	  0.092%	  0.092%	     0.000	        1	[img_int8]:0
 	       DEPTHWISE_CONV_2D	            0.015	    0.185	    0.218	  1.374%	  1.466%	     0.000	        1	[fer_small/tf_op_layer_Relu/Relu]:1
 	       DEPTHWISE_CONV_2D	            0.233	    0.565	    0.634	  3.996%	  5.462%	     0.000	        1	[fer_small/tf_op_layer_Relu_1/Relu_1]:2
 	                 CONV_2D	            0.868	    5.605	    4.818	 30.354%	 35.815%	     0.000	        1	[fer_small/tf_op_layer_Relu_2/Relu_2]:3
 	       DEPTHWISE_CONV_2D	            5.686	    0.287	    0.320	  2.019%	 37.834%	     0.000	        1	[fer_small/tf_op_layer_Relu_3/Relu_3]:4
 	       DEPTHWISE_CONV_2D	            6.007	    0.248	    0.282	  1.779%	 39.613%	     0.000	        1	[fer_small/tf_op_layer_Relu_4/Relu_4]:5
 	                 CONV_2D	            6.290	    4.970	    4.386	 27.629%	 67.243%	     0.000	        1	[fer_small/tf_op_layer_Relu_5/Relu_5]:6
 	       DEPTHWISE_CONV_2D	           10.677	    0.126	    0.146	  0.922%	 68.165%	     0.000	        1	[fer_small/tf_op_layer_Relu_6/Relu_6]:7
 	       DEPTHWISE_CONV_2D	           10.823	    0.112	    0.127	  0.802%	 68.966%	     0.000	        1	[fer_small/tf_op_layer_Relu_7/Relu_7]:8
 	                 CONV_2D	           10.951	    4.821	    4.339	 27.332%	 96.299%	     0.000	        1	[fer_small/tf_op_layer_Relu_8/Relu_8]:9
 	       DEPTHWISE_CONV_2D	           15.291	    0.067	    0.081	  0.512%	 96.811%	     0.000	        1	[fer_small/tf_op_layer_Relu_9/Relu_9]:10
 	                    MEAN	           15.373	    0.051	    0.044	  0.276%	 97.087%	     0.000	        1	[fer_small/global_average_pooling2d/Mean]:11
 	         FULLY_CONNECTED	           15.417	    0.422	    0.444	  2.800%	 99.887%	     0.000	        1	[fer_small/dense/Relu]:12
 	         FULLY_CONNECTED	           15.862	    0.012	    0.015	  0.092%	 99.980%	     0.000	        1	[fer_small/dense_1/BiasAdd]:13
 	                 SOFTMAX	           15.877	    0.002	    0.002	  0.011%	 99.991%	     0.000	        1	[Identity_int8]:14
 	                QUANTIZE	           15.879	    0.001	    0.001	  0.009%	100.000%	     0.000	        1	[Identity]:15

 ============================== Top by Computation Time ==============================
 	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
 	                 CONV_2D	            0.868	    5.605	    4.818	 30.354%	 30.354%	     0.000	        1	[fer_small/tf_op_layer_Relu_2/Relu_2]:3
 	                 CONV_2D	            6.290	    4.970	    4.386	 27.629%	 57.983%	     0.000	        1	[fer_small/tf_op_layer_Relu_5/Relu_5]:6
 	                 CONV_2D	           10.951	    4.821	    4.339	 27.332%	 85.315%	     0.000	        1	[fer_small/tf_op_layer_Relu_8/Relu_8]:9
 	       DEPTHWISE_CONV_2D	            0.233	    0.565	    0.634	  3.996%	 89.311%	     0.000	        1	[fer_small/tf_op_layer_Relu_1/Relu_1]:2
 	         FULLY_CONNECTED	           15.417	    0.422	    0.444	  2.800%	 92.111%	     0.000	        1	[fer_small/dense/Relu]:12
 	       DEPTHWISE_CONV_2D	            5.686	    0.287	    0.320	  2.019%	 94.130%	     0.000	        1	[fer_small/tf_op_layer_Relu_3/Relu_3]:4
 	       DEPTHWISE_CONV_2D	            6.007	    0.248	    0.282	  1.779%	 95.909%	     0.000	        1	[fer_small/tf_op_layer_Relu_4/Relu_4]:5
 	       DEPTHWISE_CONV_2D	            0.015	    0.185	    0.218	  1.374%	 97.283%	     0.000	        1	[fer_small/tf_op_layer_Relu/Relu]:1
 	       DEPTHWISE_CONV_2D	           10.677	    0.126	    0.146	  0.922%	 98.205%	     0.000	        1	[fer_small/tf_op_layer_Relu_6/Relu_6]:7
 	       DEPTHWISE_CONV_2D	           10.823	    0.112	    0.127	  0.802%	 99.007%	     0.000	        1	[fer_small/tf_op_layer_Relu_7/Relu_7]:8

 Number of nodes executed: 16
 ============================== Summary by node type ==============================
 	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
 	                 CONV_2D	        3	    13.541	    85.346%	    85.346%	     0.000	        3
 	       DEPTHWISE_CONV_2D	        7	     1.808	    11.395%	    96.741%	     0.000	        7
 	         FULLY_CONNECTED	        2	     0.458	     2.887%	    99.628%	     0.000	        2
 	                    MEAN	        1	     0.043	     0.271%	    99.899%	     0.000	        1
 	                QUANTIZE	        2	     0.015	     0.095%	    99.994%	     0.000	        2
 	                 SOFTMAX	        1	     0.001	     0.006%	   100.000%	     0.000	        1

 Timings (microseconds): count=63 first=17487 curr=15774 min=13116 max=18691 avg=15873.2 std=1494
 Memory (bytes): count=0
 16 nodes observed
	STARTING!
	Duplicate flags: num_threads
	Min num runs: [50]
	Min runs duration (seconds): [1]
	Max runs duration (seconds): [150]
	Inter-run delay (seconds): [-1]
	Num threads: [1]
	Benchmark name: []
	Output prefix: []
	Min warmup runs: [1]
	Min warmup runs duration (seconds): [0.5]
	Graph: [/home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite]
	Input layers: []
	Input shapes: []
	Input value ranges: []
	Input layer values files: []
	Allow fp16 : [0]
	Require full delegation : [0]
	Enable op profiling: [1]
	Max profiling buffer entries: [1024]
	CSV File to export profiling data to: []
	Enable platform-wide tracing: [0]
	#threads used for CPU inference: [1]
	Max number of delegated partitions : [0]
	External delegate path : []
	External delegate options : []
	Use gpu : [0]
	Use xnnpack : [0]
	Loaded model /home/iizuka/fer2013-tf/trained_models/mobilenet_small.tflite
	The input model file size (MB): 0.477872
	Initialized session in 0.335ms.
	Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
	count=352 first=1625 curr=1359 min=1334 max=1842 avg=1419.58 std=71

	Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
	count=634 first=1430 curr=3071 min=1360 max=3317 avg=1554.21 std=374

	Inference timings in us: Init: 335, First inference: 1625, Warmup (avg): 1419.58, Inference (avg): 1554.21
	Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
	Peak memory footprint (MB): init=0.5625 overall=2.61719
	Profiling Info for Benchmark Initialization:
	============================== Run Order ==============================
	[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
	AllocateTensors 0.000 0.031 0.031 100.000% 100.000% 0.000 1 AllocateTensors/0

	============================== Top by Computation Time ==============================
	[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
	AllocateTensors 0.000 0.031 0.031 100.000% 100.000% 0.000 1 AllocateTensors/0

	Number of nodes executed: 1
	============================== Summary by node type ==============================
	[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
	AllocateTensors 1 0.031 100.000% 100.000% 0.000 1

	Timings (microseconds): count=1 curr=31
	Memory (bytes): count=0
	1 nodes observed



	Operator-wise Profiling Info for Regular Benchmark Runs:
	============================== Run Order ==============================
	[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
	DEPTHWISE_CONV_2D 0.000 0.064 0.071 4.541% 4.541% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:0
	DEPTHWISE_CONV_2D 0.071 0.367 0.412 26.558% 31.099% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:1
	CONV_2D 0.483 0.131 0.149 9.576% 40.675% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:2
	DEPTHWISE_CONV_2D 0.632 0.174 0.194 12.461% 53.136% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:3
	DEPTHWISE_CONV_2D 0.825 0.163 0.183 11.753% 64.888% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:4
	CONV_2D 1.008 0.118 0.135 8.665% 73.553% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:5
	DEPTHWISE_CONV_2D 1.143 0.080 0.089 5.744% 79.297% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:6
	DEPTHWISE_CONV_2D 1.232 0.114 0.079 5.100% 84.398% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:7
	CONV_2D 1.311 0.132 0.143 9.203% 93.600% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:8
	DEPTHWISE_CONV_2D 1.454 0.039 0.040 2.567% 96.167% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:9
	MEAN 1.494 0.034 0.034 2.174% 98.342% 0.000 1 [fer_small/global_average_pooling2d/Mean]:10
	FULLY_CONNECTED 1.528 0.013 0.024 1.539% 99.881% 0.000 1 [fer_small/dense/Relu]:11
	FULLY_CONNECTED 1.552 0.001 0.001 0.076% 99.957% 0.000 1 [fer_small/dense_1/BiasAdd]:12
	SOFTMAX 1.553 0.000 0.001 0.043% 100.000% 0.000 1 [Identity]:13

	============================== Top by Computation Time ==============================
	[node type] [start] [first] [avg ms] [%] [cdf%] [mem KB] [times called] [Name]
	DEPTHWISE_CONV_2D 0.071 0.367 0.412 26.558% 26.558% 0.000 1 [fer_small/tf_op_layer_Relu_1/Relu_1]:1
	DEPTHWISE_CONV_2D 0.632 0.174 0.194 12.461% 39.018% 0.000 1 [fer_small/tf_op_layer_Relu_3/Relu_3]:3
	DEPTHWISE_CONV_2D 0.825 0.163 0.183 11.753% 50.771% 0.000 1 [fer_small/tf_op_layer_Relu_4/Relu_4]:4
	CONV_2D 0.483 0.131 0.149 9.576% 60.347% 0.000 1 [fer_small/tf_op_layer_Relu_2/Relu_2]:2
	CONV_2D 1.311 0.132 0.143 9.203% 69.549% 0.000 1 [fer_small/tf_op_layer_Relu_8/Relu_8]:8
	CONV_2D 1.008 0.118 0.135 8.665% 78.214% 0.000 1 [fer_small/tf_op_layer_Relu_5/Relu_5]:5
	DEPTHWISE_CONV_2D 1.143 0.080 0.089 5.744% 83.958% 0.000 1 [fer_small/tf_op_layer_Relu_6/Relu_6]:6
	DEPTHWISE_CONV_2D 1.232 0.114 0.079 5.100% 89.059% 0.000 1 [fer_small/tf_op_layer_Relu_7/Relu_7]:7
	DEPTHWISE_CONV_2D 0.000 0.064 0.071 4.541% 93.600% 0.000 1 [fer_small/tf_op_layer_Relu/Relu]:0
	DEPTHWISE_CONV_2D 1.454 0.039 0.040 2.567% 96.167% 0.000 1 [fer_small/tf_op_layer_Relu_9/Relu_9]:9

	Number of nodes executed: 14
	============================== Summary by node type ==============================
	[Node type] [count] [avg ms] [avg %] [cdf %] [mem KB] [times called]
	DEPTHWISE_CONV_2D 7 1.064 68.867% 68.867% 0.000 7
	CONV_2D 3 0.424 27.443% 96.311% 0.000 3
	MEAN 1 0.033 2.136% 98.447% 0.000 1
	FULLY_CONNECTED 2 0.024 1.553% 100.000% 0.000 2
	SOFTMAX 1 0.000 0.000% 100.000% 0.000 1

	Timings (microseconds): count=634 first=1430 curr=3069 min=1357 max=3315 avg=1552.91 std=374
	Memory (bytes): count=0
	14 nodes observed