nyanshell · June 8, 2019 08:10
diff --git a/README.md b/README.md
diff --git a/gtx_1080.txt b/gtx_1080.txt
 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
 python: can't open file 'tf_cnn_benchmarks.py': [Errno 2] No such file or directory
 (nn) scarlet@debian:~/code/benchmarks/scripts$ cd tf_cnn_benchmarks/
 (nn) scarlet@debian:~/code/benchmarks/scripts/tf_cnn_benchmarks$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
 2018-12-11 23:59:29.844707: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
 2018-12-11 23:59:30.159856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
 2018-12-11 23:59:30.160244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
 name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
 pciBusID: 0000:01:00.0
 totalMemory: 7.93GiB freeMemory: 7.81GiB
 2018-12-11 23:59:30.160257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-12-11 23:59:30.326162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-11 23:59:30.326188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
 2018-12-11 23:59:30.326193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
 2018-12-11 23:59:30.326309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
 TensorFlow:  1.12
 Model:       resnet50
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  64 global
             64 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 Initializing graph
 W1211 23:59:32.241793 140357647819840 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2018-12-11 23:59:32.540028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-12-11 23:59:32.540062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-11 23:59:32.540067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
 2018-12-11 23:59:32.540070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
 2018-12-11 23:59:32.540170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
 I1211 23:59:32.833969 140357647819840 tf_logging.py:115] Running local_init_op.
 I1211 23:59:32.858170 140357647819840 tf_logging.py:115] Done running local_init_op.
 Running warm up
 2018-12-11 23:59:34.476483: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-11 23:59:34.548959: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-11 23:59:34.576380: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-11 23:59:34.583204: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-11 23:59:34.656336: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-11 23:59:34.663460: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-11 23:59:34.746282: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 146.8 +/- 0.0 (jitter = 0.0)        8.220
 10      images/sec: 146.8 +/- 0.1 (jitter = 0.1)        7.880
 20      images/sec: 146.9 +/- 0.1 (jitter = 0.3)        7.910
 30      images/sec: 146.9 +/- 0.0 (jitter = 0.2)        7.821
 40      images/sec: 146.8 +/- 0.0 (jitter = 0.2)        8.005
 50      images/sec: 146.8 +/- 0.0 (jitter = 0.2)        7.770
 60      images/sec: 146.7 +/- 0.0 (jitter = 0.2)        8.116
 70      images/sec: 146.7 +/- 0.0 (jitter = 0.3)        7.818
 80      images/sec: 146.6 +/- 0.0 (jitter = 0.3)        7.979
 90      images/sec: 146.6 +/- 0.0 (jitter = 0.4)        8.094
 100     images/sec: 146.5 +/- 0.0 (jitter = 0.4)        8.036
 ----------------------------------------------------------------
 total images/sec: 146.49
 ----------------------------------------------------------------

 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server
 2018-12-12 00:03:20.007683: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
 2018-12-12 00:03:20.332246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
 2018-12-12 00:03:20.332624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
 name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
 pciBusID: 0000:01:00.0
 totalMemory: 7.93GiB freeMemory: 7.81GiB
 2018-12-12 00:03:20.332640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-12-12 00:03:20.502800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 00:03:20.502826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
 2018-12-12 00:03:20.502831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
 2018-12-12 00:03:20.502942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
 TensorFlow:  1.12
 Model:       inception3
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  64 global
             64 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 Initializing graph
 W1212 00:03:23.451073 140471892030528 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2018-12-12 00:03:23.932654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-12-12 00:03:23.932686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 00:03:23.932691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
 2018-12-12 00:03:23.932695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
 2018-12-12 00:03:23.932792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
 I1212 00:03:24.400829 140471892030528 tf_logging.py:115] Running local_init_op.
 I1212 00:03:24.435635 140471892030528 tf_logging.py:115] Done running local_init_op.
 Running warm up
 2018-12-12 00:03:27.317654: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:03:27.330475: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.97GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:03:27.345487: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:03:27.486848: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.74GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:03:27.500194: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:03:27.546013: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.03GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:03:27.561704: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 100.7 +/- 0.0 (jitter = 0.0)        7.262
 10      images/sec: 100.4 +/- 0.1 (jitter = 0.2)        7.308
 20      images/sec: 100.3 +/- 0.1 (jitter = 0.3)        7.291
 30      images/sec: 100.2 +/- 0.1 (jitter = 0.3)        7.423
 40      images/sec: 100.0 +/- 0.1 (jitter = 0.4)        7.307
 50      images/sec: 100.0 +/- 0.1 (jitter = 0.4)        7.275
 60      images/sec: 100.0 +/- 0.0 (jitter = 0.4)        7.316
 70      images/sec: 99.9 +/- 0.0 (jitter = 0.5) 7.379
 80      images/sec: 99.8 +/- 0.0 (jitter = 0.4) 7.408
 90      images/sec: 99.8 +/- 0.0 (jitter = 0.4) 7.313
 100     images/sec: 99.7 +/- 0.0 (jitter = 0.5) 7.354
 ----------------------------------------------------------------
 total images/sec: 99.70
 ----------------------------------------------------------------

 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --variable_update=parameter_server
 2018-12-12 00:05:23.387446: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
 2018-12-12 00:05:23.702596: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
 2018-12-12 00:05:23.702911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
 name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
 pciBusID: 0000:01:00.0
 totalMemory: 7.93GiB freeMemory: 7.81GiB
 2018-12-12 00:05:23.702924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-12-12 00:05:23.870091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 00:05:23.870116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
 2018-12-12 00:05:23.870121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
 2018-12-12 00:05:23.870237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
 TensorFlow:  1.12
 Model:       vgg16
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  64 global
             64 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 Initializing graph
 W1212 00:05:24.247153 140234968175680 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2018-12-12 00:05:24.306492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-12-12 00:05:24.306524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 00:05:24.306529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
 2018-12-12 00:05:24.306533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
 2018-12-12 00:05:24.306630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
 I1212 00:05:24.375437 140234968175680 tf_logging.py:115] Running local_init_op.
 I1212 00:05:24.416141 140234968175680 tf_logging.py:115] Done running local_init_op.
 Running warm up
 2018-12-12 00:05:25.323004: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:05:25.624297: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:05:28.841605: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2018-12-12 00:05:29.057697: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 95.3 +/- 0.0 (jitter = 0.0) 7.245
 10      images/sec: 96.2 +/- 0.3 (jitter = 0.8) 7.282
 20      images/sec: 95.9 +/- 0.4 (jitter = 1.4) 7.267
 30      images/sec: 96.0 +/- 0.3 (jitter = 0.9) 7.266
 40      images/sec: 95.8 +/- 0.3 (jitter = 1.1) 7.289
 50      images/sec: 95.8 +/- 0.2 (jitter = 1.2) 7.282
 60      images/sec: 95.7 +/- 0.2 (jitter = 1.2) 7.272
 70      images/sec: 95.6 +/- 0.2 (jitter = 1.3) 7.258
 80      images/sec: 95.7 +/- 0.2 (jitter = 1.2) 7.276
 90      images/sec: 95.6 +/- 0.1 (jitter = 1.2) 7.286
 100     images/sec: 95.5 +/- 0.1 (jitter = 1.4) 7.264
 ----------------------------------------------------------------
 total images/sec: 95.53
 ----------------------------------------------------------------
diff --git a/jetson_nano.txt b/jetson_nano.txt
 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=8 --model=resnet50 --variable_update=parameter_server
 2019-06-08 15:16:45.067180: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
 2019-06-08 15:16:45.068214: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55af02e3a0 executing computations on platform Host. Devices:
 2019-06-08 15:16:45.068302: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
 2019-06-08 15:16:45.220278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
 2019-06-08 15:16:45.220560: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55adb2bdc0 executing computations on platform CUDA. Devices:
 2019-06-08 15:16:45.220642: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
 2019-06-08 15:16:45.221057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
 name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
 pciBusID: 0000:00:00.0
 totalMemory: 3.87GiB freeMemory: 2.37GiB
 2019-06-08 15:16:45.221150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
 2019-06-08 15:16:50.816566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
 2019-06-08 15:16:50.816650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
 2019-06-08 15:16:50.816688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
 2019-06-08 15:16:50.816885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1648 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
 TensorFlow:  1.13
 Model:       resnet50
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  8 global
             8 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 W0608 15:16:50.843825 548390839744 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Colocations handled automatically by placer.
 W0608 15:16:50.933079 548390839744 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.conv2d instead.
 W0608 15:16:51.148540 548390839744 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.max_pooling2d instead.
 W0608 15:17:03.345488 548390839744 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use tf.cast instead.
 W0608 15:17:04.021918 548390839744 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use tf.cast instead.
 Initializing graph
 W0608 15:17:09.478287 548390839744 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2019-06-08 15:17:19.255147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
 2019-06-08 15:17:19.255265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
 2019-06-08 15:17:19.255309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
 2019-06-08 15:17:19.255342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
 2019-06-08 15:17:19.255448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1648 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
 I0608 15:17:22.634904 548390839744 session_manager.py:491] Running local_init_op.
 I0608 15:17:22.855322 548390839744 session_manager.py:493] Done running local_init_op.
 Running warm up
 2019-06-08 15:17:28.219080: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
 2019-06-08 15:18:33.621388: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:34.520592: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:34.811993: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:34.865131: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:34.988038: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:35.386779: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:35.493907: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:35.705907: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:36.003183: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:18:36.109775: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.510
 10      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  7.602
 20      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.671
 30      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.026
 40      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  7.519
 50      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  7.523
 60      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.667
 70      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.359
 80      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.034
 90      images/sec: 4.4 +/- 0.0 (jitter = 0.0)  8.291
 100     images/sec: 4.4 +/- 0.0 (jitter = 0.0)  7.673
 ----------------------------------------------------------------
 total images/sec: 4.35
 ----------------------------------------------------------------

 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=8 --model=inception3 --variable_update=parameter_server
 2019-06-08 15:35:36.421819: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
 2019-06-08 15:35:36.424663: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x5568ada3b0 executing computations on platform Host. Devices:
 2019-06-08 15:35:36.425801: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
 2019-06-08 15:35:36.548136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
 2019-06-08 15:35:36.548468: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55675d7dd0 executing computations on platform CUDA. Devices:
 2019-06-08 15:35:36.548526: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
 2019-06-08 15:35:36.548864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
 name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
 pciBusID: 0000:00:00.0
 totalMemory: 3.87GiB freeMemory: 2.26GiB
 2019-06-08 15:35:36.548931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
 2019-06-08 15:35:40.941625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
 2019-06-08 15:35:40.941709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
 2019-06-08 15:35:40.941740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
 2019-06-08 15:35:40.941938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1623 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
 TensorFlow:  1.13
 Model:       inception3
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  8 global
             8 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 W0608 15:35:40.963752 547771041216 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Colocations handled automatically by placer.
 W0608 15:35:41.045766 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.conv2d instead.
 W0608 15:35:41.635923 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.max_pooling2d instead.
 W0608 15:35:43.426353 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.average_pooling2d instead.
 W0608 15:36:00.615895 547771041216 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use tf.cast instead.
 Initializing graph
 W0608 15:36:09.778275 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2019-06-08 15:36:25.259073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
 2019-06-08 15:36:25.259192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
 2019-06-08 15:36:25.259235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
 2019-06-08 15:36:25.259268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
 2019-06-08 15:36:25.259382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1623 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
 I0608 15:36:29.572513 547771041216 session_manager.py:491] Running local_init_op.
 I0608 15:36:29.917439 547771041216 session_manager.py:493] Done running local_init_op.
 Running warm up
 2019-06-08 15:36:38.498294: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
 2019-06-08 15:37:05.486777: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:08.217127: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:08.642988: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:09.917505: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:09.980112: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:10.036479: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 924.06MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:11.261485: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.01GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:11.357251: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.73GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:11.439711: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.44GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 15:37:11.504237: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 901.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.421
 10      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.196
 20      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.483
 30      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.216
 40      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.459
 50      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.309
 60      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.394
 70      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.625
 80      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.433
 90      images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.350
 100     images/sec: 2.8 +/- 0.0 (jitter = 0.0)  7.536
 ----------------------------------------------------------------
 total images/sec: 2.83
 ----------------------------------------------------------------
 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=1 --model=vgg16 --variable_update=parameter_server
 2019-06-08 15:59:49.367576: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
 2019-06-08 15:59:49.368167: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55afada390 executing computations on platform Host. Devices:
 2019-06-08 15:59:49.368227: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
 2019-06-08 15:59:49.495954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
 2019-06-08 15:59:49.496232: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55ae5d7db0 executing computations on platform CUDA. Devices:
 2019-06-08 15:59:49.496316: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
 2019-06-08 15:59:49.496770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
 name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
 pciBusID: 0000:00:00.0
 totalMemory: 3.87GiB freeMemory: 2.24GiB
 2019-06-08 15:59:49.496867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
 2019-06-08 15:59:53.468034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
 2019-06-08 15:59:53.468107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
 2019-06-08 15:59:53.468136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
 2019-06-08 15:59:53.468348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1615 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
 TensorFlow:  1.13
 Model:       vgg16
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  1 global
             1 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 W0608 15:59:53.489730 548159309248 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Colocations handled automatically by placer.
 W0608 15:59:53.571482 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.conv2d instead.
 W0608 15:59:53.762966 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.max_pooling2d instead.
 W0608 15:59:55.129193 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:403: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use keras.layers.dropout instead.
 W0608 15:59:55.132718 548159309248 deprecation.py:506] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
 Instructions for updating:
 Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
 W0608 15:59:55.390173 548159309248 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
 Instructions for updating:
 Use tf.cast instead.
 Initializing graph
 W0608 15:59:56.978619 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2019-06-08 15:59:58.865290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
 2019-06-08 15:59:58.865408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
 2019-06-08 15:59:58.865455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
 2019-06-08 15:59:58.865489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
 2019-06-08 15:59:58.865689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1615 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
 I0608 16:00:00.514552 548159309248 session_manager.py:491] Running local_init_op.
 I0608 16:00:01.024017 548159309248 session_manager.py:493] Done running local_init_op.
 Running warm up
 2019-06-08 16:00:02.062699: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
 2019-06-08 16:00:27.234406: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:27.662109: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 882.56MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:27.728000: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:28.034288: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:28.435415: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:28.798298: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:29.386786: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:29.396037: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 547.21MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:29.552577: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 2019-06-08 16:00:29.570247: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 10      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 20      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 30      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 40      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 50      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 60      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 70      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 80      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 90      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 100     images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
 ----------------------------------------------------------------
 total images/sec: 1.01
 ----------------------------------------------------------------
diff --git a/vega_64_n_rx_580.txt b/vega_64_n_rx_580.txt
 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
 WARNING: Logging before flag parsing goes to stderr.
 W1212 22:42:38.242325 140213769527424 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
 Instructions for updating:
 The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
 W1212 22:42:38.293964 140213769527424 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
 Instructions for updating:
 The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
 2018-12-12 22:42:41.009296: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
 2018-12-12 22:42:41.012107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
 name: Vega [Radeon RX Vega]
 AMDGPU ISA: gfx900
 memoryClockRate (GHz) 1.663
 pciBusID 0000:0c:00.0
 Total memory: 7.98GiB
 Free memory: 7.73GiB
 2018-12-12 22:42:41.012192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
 name: Ellesmere [Radeon RX 470/480]
 AMDGPU ISA: gfx803
 memoryClockRate (GHz) 1.38
 pciBusID 0000:42:00.0
 Total memory: 4.00GiB
 Free memory: 3.75GiB
 2018-12-12 22:42:41.012797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
 2018-12-12 22:42:41.012822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 22:42:41.012828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057]      0 1
 2018-12-12 22:42:41.012834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0:   N N
 2018-12-12 22:42:41.012840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1:   N N
 2018-12-12 22:42:41.012869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
 2018-12-12 22:42:41.032433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
 TensorFlow:  1.12
 Model:       resnet50
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  64 global
             64 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 Initializing graph
 W1212 22:42:43.524825 140213769527424 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2018-12-12 22:42:43.942748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
 2018-12-12 22:42:43.942834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 22:42:43.942841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057]      0 1
 2018-12-12 22:42:43.942846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0:   N N
 2018-12-12 22:42:43.942850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1:   N N
 2018-12-12 22:42:43.943941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
 2018-12-12 22:42:43.944217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
 I1212 22:42:48.287045 140213769527424 session_manager.py:498] Running local_init_op.
 I1212 22:42:48.317908 140213769527424 session_manager.py:500] Done running local_init_op.
 Running warm up
 2018-12-12 22:42:49.832475: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 ...
 2018-12-12 22:42:51.483934: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 2018-12-12 22:42:51.510592: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 195.5 +/- 0.0 (jitter = 0.0)        8.220
 10      images/sec: 195.4 +/- 0.5 (jitter = 1.3)        7.880
 20      images/sec: 195.9 +/- 0.5 (jitter = 1.6)        7.910
 30      images/sec: 195.9 +/- 0.4 (jitter = 1.6)        7.821
 40      images/sec: 195.7 +/- 0.3 (jitter = 1.5)        8.004
 50      images/sec: 195.9 +/- 0.3 (jitter = 1.5)        7.768
 60      images/sec: 195.8 +/- 0.3 (jitter = 1.4)        8.113
 70      images/sec: 195.9 +/- 0.2 (jitter = 1.3)        7.817
 80      images/sec: 196.0 +/- 0.2 (jitter = 1.2)        7.976
 90      images/sec: 196.0 +/- 0.2 (jitter = 1.1)        8.101
 100     images/sec: 196.0 +/- 0.2 (jitter = 1.2)        8.035
 ----------------------------------------------------------------
 total images/sec: 195.95
 ----------------------------------------------------------------

 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server
 WARNING: Logging before flag parsing goes to stderr.
 W1212 22:48:40.127813 140003118698624 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
 Instructions for updating:
 The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
 W1212 22:48:40.179881 140003118698624 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
 Instructions for updating:
 The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
 2018-12-12 22:48:42.545716: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
 2018-12-12 22:48:42.546148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
 name: Vega [Radeon RX Vega]
 AMDGPU ISA: gfx900
 memoryClockRate (GHz) 1.663
 pciBusID 0000:0c:00.0
 Total memory: 7.98GiB
 Free memory: 7.73GiB
 2018-12-12 22:48:42.546222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
 name: Ellesmere [Radeon RX 470/480]
 AMDGPU ISA: gfx803
 memoryClockRate (GHz) 1.38
 pciBusID 0000:42:00.0
 Total memory: 4.00GiB
 Free memory: 3.75GiB
 2018-12-12 22:48:42.546263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
 2018-12-12 22:48:42.546282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 22:48:42.546288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057]      0 1
 2018-12-12 22:48:42.546294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0:   N N
 2018-12-12 22:48:42.546300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1:   N N
 2018-12-12 22:48:42.546337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
 2018-12-12 22:48:42.564639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
 TensorFlow:  1.12
 Model:       inception3
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  64 global
             64 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 Initializing graph
 W1212 22:48:46.370111 140003118698624 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2018-12-12 22:48:47.068808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
 2018-12-12 22:48:47.068891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 22:48:47.068898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057]      0 1
 2018-12-12 22:48:47.068905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0:   N N
 2018-12-12 22:48:47.068912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1:   N N
 2018-12-12 22:48:47.069753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
 2018-12-12 22:48:47.070006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
 I1212 22:48:50.449055 140003118698624 session_manager.py:498] Running local_init_op.
 I1212 22:48:50.493362 140003118698624 session_manager.py:500] Done running local_init_op.
 Running warm up
 2018-12-12 22:48:52.703752: I tensorflow/core/kernels/conv_grad_input_ops.cc:1023] running auto-tune for Backward-Data
 ...
 2018-12-12 22:48:56.569004: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 2018-12-12 22:48:56.605724: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 102.6 +/- 0.0 (jitter = 0.0)        7.282
 10      images/sec: 101.9 +/- 0.3 (jitter = 0.4)        7.322
 20      images/sec: 101.9 +/- 0.2 (jitter = 0.6)        7.317
 30      images/sec: 101.9 +/- 0.1 (jitter = 0.7)        7.401
 40      images/sec: 101.9 +/- 0.1 (jitter = 0.6)        7.298
 50      images/sec: 101.8 +/- 0.1 (jitter = 0.6)        7.275
 60      images/sec: 101.8 +/- 0.1 (jitter = 0.6)        7.366
 70      images/sec: 101.8 +/- 0.1 (jitter = 0.6)        7.363
 80      images/sec: 101.7 +/- 0.1 (jitter = 0.6)        7.403
 90      images/sec: 101.7 +/- 0.1 (jitter = 0.5)        7.330
 100     images/sec: 101.7 +/- 0.1 (jitter = 0.5)        7.369
 ----------------------------------------------------------------
 total images/sec: 101.65
 ----------------------------------------------------------------

 $ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --variable_update=parameter_server
 WARNING: Logging before flag parsing goes to stderr.
 W1212 22:46:25.216527 140055928705152 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
 Instructions for updating:
 The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
 W1212 22:46:25.270726 140055928705152 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
 Instructions for updating:
 The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
 2018-12-12 22:46:27.691370: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
 2018-12-12 22:46:27.691665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
 name: Vega [Radeon RX Vega]
 AMDGPU ISA: gfx900
 memoryClockRate (GHz) 1.663
 pciBusID 0000:0c:00.0
 Total memory: 7.98GiB
 Free memory: 7.73GiB
 2018-12-12 22:46:27.691731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
 name: Ellesmere [Radeon RX 470/480]
 AMDGPU ISA: gfx803
 memoryClockRate (GHz) 1.38
 pciBusID 0000:42:00.0
 Total memory: 4.00GiB
 Free memory: 3.75GiB
 2018-12-12 22:46:27.691757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
 2018-12-12 22:46:27.691772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 22:46:27.691777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057]      0 1
 2018-12-12 22:46:27.691782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0:   N N
 2018-12-12 22:46:27.691786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1:   N N
 2018-12-12 22:46:27.691818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
 2018-12-12 22:46:27.710820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
 TensorFlow:  1.12
 Model:       vgg16
 Dataset:     imagenet (synthetic)
 Mode:        training
 SingleSess:  False
 Batch size:  64 global
             64 per device
 Num batches: 100
 Num epochs:  0.00
 Devices:     ['/gpu:0']
 NUMA bind:   False
 Data format: NCHW
 Optimizer:   sgd
 Variables:   parameter_server
 ==========
 Generating training model
 Initializing graph
 W1212 22:46:28.166204 140055928705152 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
 Instructions for updating:
 Please switch to tf.train.MonitoredTrainingSession
 2018-12-12 22:46:28.245635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
 2018-12-12 22:46:28.245706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
 2018-12-12 22:46:28.245713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057]      0 1
 2018-12-12 22:46:28.245721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0:   N N
 2018-12-12 22:46:28.245726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1:   N N
 2018-12-12 22:46:28.245749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
 2018-12-12 22:46:28.245962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
 I1212 22:46:31.094600 140055928705152 session_manager.py:498] Running local_init_op.
 I1212 22:46:31.138288 140055928705152 session_manager.py:500] Done running local_init_op.
 Running warm up
 2018-12-12 22:46:31.715197: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 2018-12-12 22:46:31.930183: I tensorflow/core/kernels/conv_grad_input_ops.cc:1023] running auto-tune for Backward-Data
 ...
 2018-12-12 22:46:34.526611: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
 Done warm up
 Step    Img/sec total_loss
 1       images/sec: 109.5 +/- 0.0 (jitter = 0.0)        7.239
 10      images/sec: 109.4 +/- 0.1 (jitter = 0.2)        7.279
 20      images/sec: 109.4 +/- 0.1 (jitter = 0.2)        7.283
 30      images/sec: 109.2 +/- 0.1 (jitter = 0.3)        7.278
 40      images/sec: 109.1 +/- 0.1 (jitter = 0.4)        7.286
 50      images/sec: 109.1 +/- 0.1 (jitter = 0.4)        7.272
 60      images/sec: 109.0 +/- 0.1 (jitter = 0.5)        7.286
 70      images/sec: 109.0 +/- 0.1 (jitter = 0.5)        7.258
 80      images/sec: 109.0 +/- 0.1 (jitter = 0.5)        7.267
 90      images/sec: 109.0 +/- 0.0 (jitter = 0.5)        7.267
 100     images/sec: 109.0 +/- 0.0 (jitter = 0.4)        7.260
 ----------------------------------------------------------------
 total images/sec: 108.95
 ----------------------------------------------------------------
	$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
	python: can't open file 'tf_cnn_benchmarks.py': [Errno 2] No such file or directory
	(nn) scarlet@debian:~/code/benchmarks/scripts$ cd tf_cnn_benchmarks/
	(nn) scarlet@debian:~/code/benchmarks/scripts/tf_cnn_benchmarks$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
	2018-12-11 23:59:29.844707: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
	2018-12-11 23:59:30.159856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2018-12-11 23:59:30.160244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
	name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
	pciBusID: 0000:01:00.0
	totalMemory: 7.93GiB freeMemory: 7.81GiB
	2018-12-11 23:59:30.160257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
	2018-12-11 23:59:30.326162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-11 23:59:30.326188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
	2018-12-11 23:59:30.326193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
	2018-12-11 23:59:30.326309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
	TensorFlow: 1.12
	Model: resnet50
	Dataset: imagenet (synthetic)
	Mode: training
	SingleSess: False
	Batch size: 64 global
	64 per device
	Num batches: 100
	Num epochs: 0.00
	Devices: ['/gpu:0']
	NUMA bind: False
	Data format: NCHW
	Optimizer: sgd
	Variables: parameter_server
	==========
	Generating training model
	Initializing graph
	W1211 23:59:32.241793 140357647819840 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
	Instructions for updating:
	Please switch to tf.train.MonitoredTrainingSession
	2018-12-11 23:59:32.540028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
	2018-12-11 23:59:32.540062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-11 23:59:32.540067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
	2018-12-11 23:59:32.540070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
	2018-12-11 23:59:32.540170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
	I1211 23:59:32.833969 140357647819840 tf_logging.py:115] Running local_init_op.
	I1211 23:59:32.858170 140357647819840 tf_logging.py:115] Done running local_init_op.
	Running warm up
	2018-12-11 23:59:34.476483: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-11 23:59:34.548959: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-11 23:59:34.576380: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-11 23:59:34.583204: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-11 23:59:34.656336: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-11 23:59:34.663460: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-11 23:59:34.746282: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	Done warm up
	Step Img/sec total_loss
	1 images/sec: 146.8 +/- 0.0 (jitter = 0.0) 8.220
	10 images/sec: 146.8 +/- 0.1 (jitter = 0.1) 7.880
	20 images/sec: 146.9 +/- 0.1 (jitter = 0.3) 7.910
	30 images/sec: 146.9 +/- 0.0 (jitter = 0.2) 7.821
	40 images/sec: 146.8 +/- 0.0 (jitter = 0.2) 8.005
	50 images/sec: 146.8 +/- 0.0 (jitter = 0.2) 7.770
	60 images/sec: 146.7 +/- 0.0 (jitter = 0.2) 8.116
	70 images/sec: 146.7 +/- 0.0 (jitter = 0.3) 7.818
	80 images/sec: 146.6 +/- 0.0 (jitter = 0.3) 7.979
	90 images/sec: 146.6 +/- 0.0 (jitter = 0.4) 8.094
	100 images/sec: 146.5 +/- 0.0 (jitter = 0.4) 8.036
	----------------------------------------------------------------
	total images/sec: 146.49
	----------------------------------------------------------------

	$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server
	2018-12-12 00:03:20.007683: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
	2018-12-12 00:03:20.332246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2018-12-12 00:03:20.332624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
	name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
	pciBusID: 0000:01:00.0
	totalMemory: 7.93GiB freeMemory: 7.81GiB
	2018-12-12 00:03:20.332640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
	2018-12-12 00:03:20.502800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 00:03:20.502826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
	2018-12-12 00:03:20.502831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
	2018-12-12 00:03:20.502942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
	TensorFlow: 1.12
	Model: inception3
	Dataset: imagenet (synthetic)
	Mode: training
	SingleSess: False
	Batch size: 64 global
	64 per device
	Num batches: 100
	Num epochs: 0.00
	Devices: ['/gpu:0']
	NUMA bind: False
	Data format: NCHW
	Optimizer: sgd
	Variables: parameter_server
	==========
	Generating training model
	Initializing graph
	W1212 00:03:23.451073 140471892030528 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
	Instructions for updating:
	Please switch to tf.train.MonitoredTrainingSession
	2018-12-12 00:03:23.932654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
	2018-12-12 00:03:23.932686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 00:03:23.932691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
	2018-12-12 00:03:23.932695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
	2018-12-12 00:03:23.932792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
	I1212 00:03:24.400829 140471892030528 tf_logging.py:115] Running local_init_op.
	I1212 00:03:24.435635 140471892030528 tf_logging.py:115] Done running local_init_op.
	Running warm up
	2018-12-12 00:03:27.317654: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:03:27.330475: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.97GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:03:27.345487: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:03:27.486848: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.74GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:03:27.500194: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:03:27.546013: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.03GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:03:27.561704: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	Done warm up
	Step Img/sec total_loss
	1 images/sec: 100.7 +/- 0.0 (jitter = 0.0) 7.262
	10 images/sec: 100.4 +/- 0.1 (jitter = 0.2) 7.308
	20 images/sec: 100.3 +/- 0.1 (jitter = 0.3) 7.291
	30 images/sec: 100.2 +/- 0.1 (jitter = 0.3) 7.423
	40 images/sec: 100.0 +/- 0.1 (jitter = 0.4) 7.307
	50 images/sec: 100.0 +/- 0.1 (jitter = 0.4) 7.275
	60 images/sec: 100.0 +/- 0.0 (jitter = 0.4) 7.316
	70 images/sec: 99.9 +/- 0.0 (jitter = 0.5) 7.379
	80 images/sec: 99.8 +/- 0.0 (jitter = 0.4) 7.408
	90 images/sec: 99.8 +/- 0.0 (jitter = 0.4) 7.313
	100 images/sec: 99.7 +/- 0.0 (jitter = 0.5) 7.354
	----------------------------------------------------------------
	total images/sec: 99.70
	----------------------------------------------------------------

	$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --variable_update=parameter_server
	2018-12-12 00:05:23.387446: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
	2018-12-12 00:05:23.702596: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
	2018-12-12 00:05:23.702911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
	name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
	pciBusID: 0000:01:00.0
	totalMemory: 7.93GiB freeMemory: 7.81GiB
	2018-12-12 00:05:23.702924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
	2018-12-12 00:05:23.870091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 00:05:23.870116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
	2018-12-12 00:05:23.870121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
	2018-12-12 00:05:23.870237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
	TensorFlow: 1.12
	Model: vgg16
	Dataset: imagenet (synthetic)
	Mode: training
	SingleSess: False
	Batch size: 64 global
	64 per device
	Num batches: 100
	Num epochs: 0.00
	Devices: ['/gpu:0']
	NUMA bind: False
	Data format: NCHW
	Optimizer: sgd
	Variables: parameter_server
	==========
	Generating training model
	Initializing graph
	W1212 00:05:24.247153 140234968175680 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
	Instructions for updating:
	Please switch to tf.train.MonitoredTrainingSession
	2018-12-12 00:05:24.306492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
	2018-12-12 00:05:24.306524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 00:05:24.306529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
	2018-12-12 00:05:24.306533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
	2018-12-12 00:05:24.306630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
	I1212 00:05:24.375437 140234968175680 tf_logging.py:115] Running local_init_op.
	I1212 00:05:24.416141 140234968175680 tf_logging.py:115] Done running local_init_op.
	Running warm up
	2018-12-12 00:05:25.323004: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:05:25.624297: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:05:28.841605: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	2018-12-12 00:05:29.057697: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
	Done warm up
	Step Img/sec total_loss
	1 images/sec: 95.3 +/- 0.0 (jitter = 0.0) 7.245
	10 images/sec: 96.2 +/- 0.3 (jitter = 0.8) 7.282
	20 images/sec: 95.9 +/- 0.4 (jitter = 1.4) 7.267
	30 images/sec: 96.0 +/- 0.3 (jitter = 0.9) 7.266
	40 images/sec: 95.8 +/- 0.3 (jitter = 1.1) 7.289
	50 images/sec: 95.8 +/- 0.2 (jitter = 1.2) 7.282
	60 images/sec: 95.7 +/- 0.2 (jitter = 1.2) 7.272
	70 images/sec: 95.6 +/- 0.2 (jitter = 1.3) 7.258
	80 images/sec: 95.7 +/- 0.2 (jitter = 1.2) 7.276
	90 images/sec: 95.6 +/- 0.1 (jitter = 1.2) 7.286
	100 images/sec: 95.5 +/- 0.1 (jitter = 1.4) 7.264
	----------------------------------------------------------------
	total images/sec: 95.53
	----------------------------------------------------------------
	$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
	WARNING: Logging before flag parsing goes to stderr.
	W1212 22:42:38.242325 140213769527424 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
	Instructions for updating:
	The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
	W1212 22:42:38.293964 140213769527424 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
	Instructions for updating:
	The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
	2018-12-12 22:42:41.009296: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
	2018-12-12 22:42:41.012107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
	name: Vega [Radeon RX Vega]
	AMDGPU ISA: gfx900
	memoryClockRate (GHz) 1.663
	pciBusID 0000:0c:00.0
	Total memory: 7.98GiB
	Free memory: 7.73GiB
	2018-12-12 22:42:41.012192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
	name: Ellesmere [Radeon RX 470/480]
	AMDGPU ISA: gfx803
	memoryClockRate (GHz) 1.38
	pciBusID 0000:42:00.0
	Total memory: 4.00GiB
	Free memory: 3.75GiB
	2018-12-12 22:42:41.012797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
	2018-12-12 22:42:41.012822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 22:42:41.012828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
	2018-12-12 22:42:41.012834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
	2018-12-12 22:42:41.012840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
	2018-12-12 22:42:41.012869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
	2018-12-12 22:42:41.032433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
	TensorFlow: 1.12
	Model: resnet50
	Dataset: imagenet (synthetic)
	Mode: training
	SingleSess: False
	Batch size: 64 global
	64 per device
	Num batches: 100
	Num epochs: 0.00
	Devices: ['/gpu:0']
	NUMA bind: False
	Data format: NCHW
	Optimizer: sgd
	Variables: parameter_server
	==========
	Generating training model
	Initializing graph
	W1212 22:42:43.524825 140213769527424 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
	Instructions for updating:
	Please switch to tf.train.MonitoredTrainingSession
	2018-12-12 22:42:43.942748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
	2018-12-12 22:42:43.942834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 22:42:43.942841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
	2018-12-12 22:42:43.942846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
	2018-12-12 22:42:43.942850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
	2018-12-12 22:42:43.943941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
	2018-12-12 22:42:43.944217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
	I1212 22:42:48.287045 140213769527424 session_manager.py:498] Running local_init_op.
	I1212 22:42:48.317908 140213769527424 session_manager.py:500] Done running local_init_op.
	Running warm up
	2018-12-12 22:42:49.832475: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	...
	2018-12-12 22:42:51.483934: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	2018-12-12 22:42:51.510592: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	Done warm up
	Step Img/sec total_loss
	1 images/sec: 195.5 +/- 0.0 (jitter = 0.0) 8.220
	10 images/sec: 195.4 +/- 0.5 (jitter = 1.3) 7.880
	20 images/sec: 195.9 +/- 0.5 (jitter = 1.6) 7.910
	30 images/sec: 195.9 +/- 0.4 (jitter = 1.6) 7.821
	40 images/sec: 195.7 +/- 0.3 (jitter = 1.5) 8.004
	50 images/sec: 195.9 +/- 0.3 (jitter = 1.5) 7.768
	60 images/sec: 195.8 +/- 0.3 (jitter = 1.4) 8.113
	70 images/sec: 195.9 +/- 0.2 (jitter = 1.3) 7.817
	80 images/sec: 196.0 +/- 0.2 (jitter = 1.2) 7.976
	90 images/sec: 196.0 +/- 0.2 (jitter = 1.1) 8.101
	100 images/sec: 196.0 +/- 0.2 (jitter = 1.2) 8.035
	----------------------------------------------------------------
	total images/sec: 195.95
	----------------------------------------------------------------

	$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server
	WARNING: Logging before flag parsing goes to stderr.
	W1212 22:48:40.127813 140003118698624 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
	Instructions for updating:
	The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
	W1212 22:48:40.179881 140003118698624 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
	Instructions for updating:
	The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
	2018-12-12 22:48:42.545716: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
	2018-12-12 22:48:42.546148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
	name: Vega [Radeon RX Vega]
	AMDGPU ISA: gfx900
	memoryClockRate (GHz) 1.663
	pciBusID 0000:0c:00.0
	Total memory: 7.98GiB
	Free memory: 7.73GiB
	2018-12-12 22:48:42.546222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
	name: Ellesmere [Radeon RX 470/480]
	AMDGPU ISA: gfx803
	memoryClockRate (GHz) 1.38
	pciBusID 0000:42:00.0
	Total memory: 4.00GiB
	Free memory: 3.75GiB
	2018-12-12 22:48:42.546263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
	2018-12-12 22:48:42.546282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 22:48:42.546288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
	2018-12-12 22:48:42.546294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
	2018-12-12 22:48:42.546300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
	2018-12-12 22:48:42.546337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
	2018-12-12 22:48:42.564639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
	TensorFlow: 1.12
	Model: inception3
	Dataset: imagenet (synthetic)
	Mode: training
	SingleSess: False
	Batch size: 64 global
	64 per device
	Num batches: 100
	Num epochs: 0.00
	Devices: ['/gpu:0']
	NUMA bind: False
	Data format: NCHW
	Optimizer: sgd
	Variables: parameter_server
	==========
	Generating training model
	Initializing graph
	W1212 22:48:46.370111 140003118698624 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
	Instructions for updating:
	Please switch to tf.train.MonitoredTrainingSession
	2018-12-12 22:48:47.068808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
	2018-12-12 22:48:47.068891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 22:48:47.068898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
	2018-12-12 22:48:47.068905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
	2018-12-12 22:48:47.068912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
	2018-12-12 22:48:47.069753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
	2018-12-12 22:48:47.070006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
	I1212 22:48:50.449055 140003118698624 session_manager.py:498] Running local_init_op.
	I1212 22:48:50.493362 140003118698624 session_manager.py:500] Done running local_init_op.
	Running warm up
	2018-12-12 22:48:52.703752: I tensorflow/core/kernels/conv_grad_input_ops.cc:1023] running auto-tune for Backward-Data
	...
	2018-12-12 22:48:56.569004: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	2018-12-12 22:48:56.605724: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	Done warm up
	Step Img/sec total_loss
	1 images/sec: 102.6 +/- 0.0 (jitter = 0.0) 7.282
	10 images/sec: 101.9 +/- 0.3 (jitter = 0.4) 7.322
	20 images/sec: 101.9 +/- 0.2 (jitter = 0.6) 7.317
	30 images/sec: 101.9 +/- 0.1 (jitter = 0.7) 7.401
	40 images/sec: 101.9 +/- 0.1 (jitter = 0.6) 7.298
	50 images/sec: 101.8 +/- 0.1 (jitter = 0.6) 7.275
	60 images/sec: 101.8 +/- 0.1 (jitter = 0.6) 7.366
	70 images/sec: 101.8 +/- 0.1 (jitter = 0.6) 7.363
	80 images/sec: 101.7 +/- 0.1 (jitter = 0.6) 7.403
	90 images/sec: 101.7 +/- 0.1 (jitter = 0.5) 7.330
	100 images/sec: 101.7 +/- 0.1 (jitter = 0.5) 7.369
	----------------------------------------------------------------
	total images/sec: 101.65
	----------------------------------------------------------------

	$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --variable_update=parameter_server
	WARNING: Logging before flag parsing goes to stderr.
	W1212 22:46:25.216527 140055928705152 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
	Instructions for updating:
	The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
	W1212 22:46:25.270726 140055928705152 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
	Instructions for updating:
	The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
	2018-12-12 22:46:27.691370: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
	2018-12-12 22:46:27.691665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
	name: Vega [Radeon RX Vega]
	AMDGPU ISA: gfx900
	memoryClockRate (GHz) 1.663
	pciBusID 0000:0c:00.0
	Total memory: 7.98GiB
	Free memory: 7.73GiB
	2018-12-12 22:46:27.691731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
	name: Ellesmere [Radeon RX 470/480]
	AMDGPU ISA: gfx803
	memoryClockRate (GHz) 1.38
	pciBusID 0000:42:00.0
	Total memory: 4.00GiB
	Free memory: 3.75GiB
	2018-12-12 22:46:27.691757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
	2018-12-12 22:46:27.691772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 22:46:27.691777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
	2018-12-12 22:46:27.691782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
	2018-12-12 22:46:27.691786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
	2018-12-12 22:46:27.691818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
	2018-12-12 22:46:27.710820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
	TensorFlow: 1.12
	Model: vgg16
	Dataset: imagenet (synthetic)
	Mode: training
	SingleSess: False
	Batch size: 64 global
	64 per device
	Num batches: 100
	Num epochs: 0.00
	Devices: ['/gpu:0']
	NUMA bind: False
	Data format: NCHW
	Optimizer: sgd
	Variables: parameter_server
	==========
	Generating training model
	Initializing graph
	W1212 22:46:28.166204 140055928705152 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
	Instructions for updating:
	Please switch to tf.train.MonitoredTrainingSession
	2018-12-12 22:46:28.245635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
	2018-12-12 22:46:28.245706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
	2018-12-12 22:46:28.245713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
	2018-12-12 22:46:28.245721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
	2018-12-12 22:46:28.245726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
	2018-12-12 22:46:28.245749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
	2018-12-12 22:46:28.245962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
	I1212 22:46:31.094600 140055928705152 session_manager.py:498] Running local_init_op.
	I1212 22:46:31.138288 140055928705152 session_manager.py:500] Done running local_init_op.
	Running warm up
	2018-12-12 22:46:31.715197: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	2018-12-12 22:46:31.930183: I tensorflow/core/kernels/conv_grad_input_ops.cc:1023] running auto-tune for Backward-Data
	...
	2018-12-12 22:46:34.526611: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
	Done warm up
	Step Img/sec total_loss
	1 images/sec: 109.5 +/- 0.0 (jitter = 0.0) 7.239
	10 images/sec: 109.4 +/- 0.1 (jitter = 0.2) 7.279
	20 images/sec: 109.4 +/- 0.1 (jitter = 0.2) 7.283
	30 images/sec: 109.2 +/- 0.1 (jitter = 0.3) 7.278
	40 images/sec: 109.1 +/- 0.1 (jitter = 0.4) 7.286
	50 images/sec: 109.1 +/- 0.1 (jitter = 0.4) 7.272
	60 images/sec: 109.0 +/- 0.1 (jitter = 0.5) 7.286
	70 images/sec: 109.0 +/- 0.1 (jitter = 0.5) 7.258
	80 images/sec: 109.0 +/- 0.1 (jitter = 0.5) 7.267
	90 images/sec: 109.0 +/- 0.0 (jitter = 0.5) 7.267
	100 images/sec: 109.0 +/- 0.0 (jitter = 0.4) 7.260
	----------------------------------------------------------------
	total images/sec: 108.95
	----------------------------------------------------------------