Skip to content

Instantly share code, notes, and snippets.

@nyanshell
Last active June 8, 2019 08:10
Show Gist options
  • Save nyanshell/b392cb8af3f9f6212e2949ef8c887c99 to your computer and use it in GitHub Desktop.
Save nyanshell/b392cb8af3f9f6212e2949ef8c887c99 to your computer and use it in GitHub Desktop.
gpu_tf_bench

environment

NVIDIA

i7-6700k
ASUS GTX 1080 8G
CUDA 9 CUDNN 7
python 3.6.6
tensorflow 1.12.0

AMD

AMD Ryzen Threadripper 1950X
Radeon RX Vega 64 Air Boost 8G OC & MSI MSI RX 580 4G OC
ROCM 1.9.307
python 3.6.7
tensorflow-rocm 1.12.0

JETSON NANO

Nvidia Jetson Nano
python 3.6.8
tensorboard==1.13.1
tensorflow-gpu==1.13.1+nv19.5

fine tune

$ /opt/rocm/bin/rocm-smi --setfan 80% --setmclk 3 --setsclk 7 --setoverdrive 20 --setmemoverdrive 20 -d 1
$ /opt/rocm/bin/rocm-smi --setfan 70% --setmclk 3 --setsclk 7 --setoverdrive 20 --setmemoverdrive 20 -d 0

tensorflow benchmark

https://github.com/tensorflow/benchmarks bef1ac21efcefa1b3e0d16def1581552cf537163

index ebbe758..2955f5f 100644
--- a/scripts/tf_cnn_benchmarks/benchmark_cnn.py
+++ b/scripts/tf_cnn_benchmarks/benchmark_cnn.py
@@ -736,9 +736,9 @@ def create_config_proto(params):
     config.intra_op_parallelism_threads = params.num_intra_threads
   config.inter_op_parallelism_threads = params.num_inter_threads
   config.experimental.collective_group_leader = '/job:worker/replica:0/task:0'
-  config.gpu_options.experimental.collective_ring_order = params.gpu_indices
+  # config.gpu_options.experimental.collective_ring_order = params.gpu_indices
   config.gpu_options.force_gpu_compatible = params.force_gpu_compatible
-  config.experimental.use_numa_affinity = params.use_numa_affinity
+  # config.experimental.use_numa_affinity = params.use_numa_affinity
   if params.device == 'cpu':
     # TODO(tucker): change num_gpus to num_devices
     config.device_count['CPU'] = params.num_gpus
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
python: can't open file 'tf_cnn_benchmarks.py': [Errno 2] No such file or directory
(nn) scarlet@debian:~/code/benchmarks/scripts$ cd tf_cnn_benchmarks/
(nn) scarlet@debian:~/code/benchmarks/scripts/tf_cnn_benchmarks$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
2018-12-11 23:59:29.844707: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-11 23:59:30.159856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-11 23:59:30.160244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2018-12-11 23:59:30.160257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-11 23:59:30.326162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-11 23:59:30.326188: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-11 23:59:30.326193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-11 23:59:30.326309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
TensorFlow: 1.12
Model: resnet50
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 64 global
64 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
Initializing graph
W1211 23:59:32.241793 140357647819840 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-12-11 23:59:32.540028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-11 23:59:32.540062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-11 23:59:32.540067: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-11 23:59:32.540070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-11 23:59:32.540170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
I1211 23:59:32.833969 140357647819840 tf_logging.py:115] Running local_init_op.
I1211 23:59:32.858170 140357647819840 tf_logging.py:115] Done running local_init_op.
Running warm up
2018-12-11 23:59:34.476483: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-11 23:59:34.548959: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-11 23:59:34.576380: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.27GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-11 23:59:34.583204: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.54GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-11 23:59:34.656336: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-11 23:59:34.663460: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.32GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-11 23:59:34.746282: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.52GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 146.8 +/- 0.0 (jitter = 0.0) 8.220
10 images/sec: 146.8 +/- 0.1 (jitter = 0.1) 7.880
20 images/sec: 146.9 +/- 0.1 (jitter = 0.3) 7.910
30 images/sec: 146.9 +/- 0.0 (jitter = 0.2) 7.821
40 images/sec: 146.8 +/- 0.0 (jitter = 0.2) 8.005
50 images/sec: 146.8 +/- 0.0 (jitter = 0.2) 7.770
60 images/sec: 146.7 +/- 0.0 (jitter = 0.2) 8.116
70 images/sec: 146.7 +/- 0.0 (jitter = 0.3) 7.818
80 images/sec: 146.6 +/- 0.0 (jitter = 0.3) 7.979
90 images/sec: 146.6 +/- 0.0 (jitter = 0.4) 8.094
100 images/sec: 146.5 +/- 0.0 (jitter = 0.4) 8.036
----------------------------------------------------------------
total images/sec: 146.49
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server
2018-12-12 00:03:20.007683: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-12 00:03:20.332246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-12 00:03:20.332624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2018-12-12 00:03:20.332640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-12 00:03:20.502800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 00:03:20.502826: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-12 00:03:20.502831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-12 00:03:20.502942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
TensorFlow: 1.12
Model: inception3
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 64 global
64 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
Initializing graph
W1212 00:03:23.451073 140471892030528 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-12-12 00:03:23.932654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-12 00:03:23.932686: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 00:03:23.932691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-12 00:03:23.932695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-12 00:03:23.932792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
I1212 00:03:24.400829 140471892030528 tf_logging.py:115] Running local_init_op.
I1212 00:03:24.435635 140471892030528 tf_logging.py:115] Done running local_init_op.
Running warm up
2018-12-12 00:03:27.317654: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:03:27.330475: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.97GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:03:27.345487: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:03:27.486848: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.74GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:03:27.500194: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.69GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:03:27.546013: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.03GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:03:27.561704: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 100.7 +/- 0.0 (jitter = 0.0) 7.262
10 images/sec: 100.4 +/- 0.1 (jitter = 0.2) 7.308
20 images/sec: 100.3 +/- 0.1 (jitter = 0.3) 7.291
30 images/sec: 100.2 +/- 0.1 (jitter = 0.3) 7.423
40 images/sec: 100.0 +/- 0.1 (jitter = 0.4) 7.307
50 images/sec: 100.0 +/- 0.1 (jitter = 0.4) 7.275
60 images/sec: 100.0 +/- 0.0 (jitter = 0.4) 7.316
70 images/sec: 99.9 +/- 0.0 (jitter = 0.5) 7.379
80 images/sec: 99.8 +/- 0.0 (jitter = 0.4) 7.408
90 images/sec: 99.8 +/- 0.0 (jitter = 0.4) 7.313
100 images/sec: 99.7 +/- 0.0 (jitter = 0.5) 7.354
----------------------------------------------------------------
total images/sec: 99.70
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --variable_update=parameter_server
2018-12-12 00:05:23.387446: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-12 00:05:23.702596: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-12 00:05:23.702911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2018-12-12 00:05:23.702924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-12 00:05:23.870091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 00:05:23.870116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-12 00:05:23.870121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-12 00:05:23.870237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
TensorFlow: 1.12
Model: vgg16
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 64 global
64 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
Initializing graph
W1212 00:05:24.247153 140234968175680 tf_logging.py:125] From /home/scarlet/code/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2250: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-12-12 00:05:24.306492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-12 00:05:24.306524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 00:05:24.306529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-12 00:05:24.306533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-12 00:05:24.306630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7537 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
I1212 00:05:24.375437 140234968175680 tf_logging.py:115] Running local_init_op.
I1212 00:05:24.416141 140234968175680 tf_logging.py:115] Done running local_init_op.
Running warm up
2018-12-12 00:05:25.323004: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:05:25.624297: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:05:28.841605: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2018-12-12 00:05:29.057697: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 95.3 +/- 0.0 (jitter = 0.0) 7.245
10 images/sec: 96.2 +/- 0.3 (jitter = 0.8) 7.282
20 images/sec: 95.9 +/- 0.4 (jitter = 1.4) 7.267
30 images/sec: 96.0 +/- 0.3 (jitter = 0.9) 7.266
40 images/sec: 95.8 +/- 0.3 (jitter = 1.1) 7.289
50 images/sec: 95.8 +/- 0.2 (jitter = 1.2) 7.282
60 images/sec: 95.7 +/- 0.2 (jitter = 1.2) 7.272
70 images/sec: 95.6 +/- 0.2 (jitter = 1.3) 7.258
80 images/sec: 95.7 +/- 0.2 (jitter = 1.2) 7.276
90 images/sec: 95.6 +/- 0.1 (jitter = 1.2) 7.286
100 images/sec: 95.5 +/- 0.1 (jitter = 1.4) 7.264
----------------------------------------------------------------
total images/sec: 95.53
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=8 --model=resnet50 --variable_update=parameter_server
2019-06-08 15:16:45.067180: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-06-08 15:16:45.068214: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55af02e3a0 executing computations on platform Host. Devices:
2019-06-08 15:16:45.068302: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-06-08 15:16:45.220278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-08 15:16:45.220560: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55adb2bdc0 executing computations on platform CUDA. Devices:
2019-06-08 15:16:45.220642: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-06-08 15:16:45.221057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 2.37GiB
2019-06-08 15:16:45.221150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-08 15:16:50.816566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-08 15:16:50.816650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-08 15:16:50.816688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-08 15:16:50.816885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1648 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
TensorFlow: 1.13
Model: resnet50
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 8 global
8 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
W0608 15:16:50.843825 548390839744 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0608 15:16:50.933079 548390839744 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W0608 15:16:51.148540 548390839744 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W0608 15:17:03.345488 548390839744 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
W0608 15:17:04.021918 548390839744 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
W0608 15:17:09.478287 548390839744 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-06-08 15:17:19.255147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-08 15:17:19.255265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-08 15:17:19.255309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-08 15:17:19.255342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-08 15:17:19.255448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1648 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
I0608 15:17:22.634904 548390839744 session_manager.py:491] Running local_init_op.
I0608 15:17:22.855322 548390839744 session_manager.py:493] Done running local_init_op.
Running warm up
2019-06-08 15:17:28.219080: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
2019-06-08 15:18:33.621388: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:34.520592: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.17GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:34.811993: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:34.865131: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:34.988038: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:35.386779: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.29GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:35.493907: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:35.705907: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.10GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:36.003183: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:18:36.109775: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.510
10 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 7.602
20 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.671
30 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.026
40 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 7.519
50 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 7.523
60 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.667
70 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.359
80 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.034
90 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 8.291
100 images/sec: 4.4 +/- 0.0 (jitter = 0.0) 7.673
----------------------------------------------------------------
total images/sec: 4.35
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=8 --model=inception3 --variable_update=parameter_server
2019-06-08 15:35:36.421819: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-06-08 15:35:36.424663: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x5568ada3b0 executing computations on platform Host. Devices:
2019-06-08 15:35:36.425801: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-06-08 15:35:36.548136: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-08 15:35:36.548468: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55675d7dd0 executing computations on platform CUDA. Devices:
2019-06-08 15:35:36.548526: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-06-08 15:35:36.548864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 2.26GiB
2019-06-08 15:35:36.548931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-08 15:35:40.941625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-08 15:35:40.941709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-08 15:35:40.941740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-08 15:35:40.941938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1623 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
TensorFlow: 1.13
Model: inception3
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 8 global
8 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
W0608 15:35:40.963752 547771041216 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0608 15:35:41.045766 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W0608 15:35:41.635923 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W0608 15:35:43.426353 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: average_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.average_pooling2d instead.
W0608 15:36:00.615895 547771041216 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
W0608 15:36:09.778275 547771041216 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-06-08 15:36:25.259073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-08 15:36:25.259192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-08 15:36:25.259235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-08 15:36:25.259268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-08 15:36:25.259382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1623 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
I0608 15:36:29.572513 547771041216 session_manager.py:491] Running local_init_op.
I0608 15:36:29.917439 547771041216 session_manager.py:493] Done running local_init_op.
Running warm up
2019-06-08 15:36:38.498294: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
2019-06-08 15:37:05.486777: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.98GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:08.217127: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:08.642988: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:09.917505: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:09.980112: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.08GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:10.036479: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 924.06MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:11.261485: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.01GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:11.357251: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.73GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:11.439711: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.44GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 15:37:11.504237: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 901.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.421
10 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.196
20 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.483
30 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.216
40 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.459
50 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.309
60 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.394
70 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.625
80 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.433
90 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.350
100 images/sec: 2.8 +/- 0.0 (jitter = 0.0) 7.536
----------------------------------------------------------------
total images/sec: 2.83
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=1 --model=vgg16 --variable_update=parameter_server
2019-06-08 15:59:49.367576: W tensorflow/core/platform/profile_utils/cpu_utils.cc:98] Failed to find bogomips in /proc/cpuinfo; cannot determine CPU frequency
2019-06-08 15:59:49.368167: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55afada390 executing computations on platform Host. Devices:
2019-06-08 15:59:49.368227: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): <undefined>, <undefined>
2019-06-08 15:59:49.495954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:965] ARM64 does not support NUMA - returning NUMA node zero
2019-06-08 15:59:49.496232: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x55ae5d7db0 executing computations on platform CUDA. Devices:
2019-06-08 15:59:49.496316: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): NVIDIA Tegra X1, Compute Capability 5.3
2019-06-08 15:59:49.496770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216
pciBusID: 0000:00:00.0
totalMemory: 3.87GiB freeMemory: 2.24GiB
2019-06-08 15:59:49.496867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-08 15:59:53.468034: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-08 15:59:53.468107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-08 15:59:53.468136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-08 15:59:53.468348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1615 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
TensorFlow: 1.13
Model: vgg16
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 1 global
1 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
W0608 15:59:53.489730 548159309248 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0608 15:59:53.571482 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:129: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W0608 15:59:53.762966 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:261: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W0608 15:59:55.129193 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:403: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
W0608 15:59:55.132718 548159309248 deprecation.py:506] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0608 15:59:55.390173 548159309248 deprecation.py:323] From /home/miku/venv36/lib/python3.6/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
W0608 15:59:56.978619 548159309248 deprecation.py:323] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2238: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2019-06-08 15:59:58.865290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-08 15:59:58.865408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-08 15:59:58.865455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-06-08 15:59:58.865489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-06-08 15:59:58.865689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1615 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
I0608 16:00:00.514552 548159309248 session_manager.py:491] Running local_init_op.
I0608 16:00:01.024017 548159309248 session_manager.py:493] Done running local_init_op.
Running warm up
2019-06-08 16:00:02.062699: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10.0 locally
2019-06-08 16:00:27.234406: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:27.662109: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 882.56MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:27.728000: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:28.034288: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.05GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:28.435415: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.04GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:28.798298: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:29.386786: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:29.396037: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 547.21MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:29.552577: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.14GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-06-08 16:00:29.570247: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.07GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
Done warm up
Step Img/sec total_loss
1 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
10 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
20 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
30 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
40 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
50 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
60 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
70 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
80 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
90 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
100 images/sec: 1.0 +/- 0.0 (jitter = 0.0) nan
----------------------------------------------------------------
total images/sec: 1.01
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=resnet50 --variable_update=parameter_server
WARNING: Logging before flag parsing goes to stderr.
W1212 22:42:38.242325 140213769527424 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W1212 22:42:38.293964 140213769527424 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
2018-12-12 22:42:41.009296: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-12-12 22:42:41.012107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
name: Vega [Radeon RX Vega]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.663
pciBusID 0000:0c:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-12-12 22:42:41.012192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
name: Ellesmere [Radeon RX 470/480]
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.38
pciBusID 0000:42:00.0
Total memory: 4.00GiB
Free memory: 3.75GiB
2018-12-12 22:42:41.012797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
2018-12-12 22:42:41.012822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 22:42:41.012828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
2018-12-12 22:42:41.012834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
2018-12-12 22:42:41.012840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
2018-12-12 22:42:41.012869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
2018-12-12 22:42:41.032433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
TensorFlow: 1.12
Model: resnet50
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 64 global
64 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
Initializing graph
W1212 22:42:43.524825 140213769527424 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-12-12 22:42:43.942748: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
2018-12-12 22:42:43.942834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 22:42:43.942841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
2018-12-12 22:42:43.942846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
2018-12-12 22:42:43.942850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
2018-12-12 22:42:43.943941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
2018-12-12 22:42:43.944217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
I1212 22:42:48.287045 140213769527424 session_manager.py:498] Running local_init_op.
I1212 22:42:48.317908 140213769527424 session_manager.py:500] Done running local_init_op.
Running warm up
2018-12-12 22:42:49.832475: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
...
2018-12-12 22:42:51.483934: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
2018-12-12 22:42:51.510592: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
Done warm up
Step Img/sec total_loss
1 images/sec: 195.5 +/- 0.0 (jitter = 0.0) 8.220
10 images/sec: 195.4 +/- 0.5 (jitter = 1.3) 7.880
20 images/sec: 195.9 +/- 0.5 (jitter = 1.6) 7.910
30 images/sec: 195.9 +/- 0.4 (jitter = 1.6) 7.821
40 images/sec: 195.7 +/- 0.3 (jitter = 1.5) 8.004
50 images/sec: 195.9 +/- 0.3 (jitter = 1.5) 7.768
60 images/sec: 195.8 +/- 0.3 (jitter = 1.4) 8.113
70 images/sec: 195.9 +/- 0.2 (jitter = 1.3) 7.817
80 images/sec: 196.0 +/- 0.2 (jitter = 1.2) 7.976
90 images/sec: 196.0 +/- 0.2 (jitter = 1.1) 8.101
100 images/sec: 196.0 +/- 0.2 (jitter = 1.2) 8.035
----------------------------------------------------------------
total images/sec: 195.95
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=inception3 --variable_update=parameter_server
WARNING: Logging before flag parsing goes to stderr.
W1212 22:48:40.127813 140003118698624 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W1212 22:48:40.179881 140003118698624 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
2018-12-12 22:48:42.545716: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-12-12 22:48:42.546148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
name: Vega [Radeon RX Vega]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.663
pciBusID 0000:0c:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-12-12 22:48:42.546222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
name: Ellesmere [Radeon RX 470/480]
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.38
pciBusID 0000:42:00.0
Total memory: 4.00GiB
Free memory: 3.75GiB
2018-12-12 22:48:42.546263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
2018-12-12 22:48:42.546282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 22:48:42.546288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
2018-12-12 22:48:42.546294: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
2018-12-12 22:48:42.546300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
2018-12-12 22:48:42.546337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
2018-12-12 22:48:42.564639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
TensorFlow: 1.12
Model: inception3
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 64 global
64 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
Initializing graph
W1212 22:48:46.370111 140003118698624 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-12-12 22:48:47.068808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
2018-12-12 22:48:47.068891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 22:48:47.068898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
2018-12-12 22:48:47.068905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
2018-12-12 22:48:47.068912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
2018-12-12 22:48:47.069753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
2018-12-12 22:48:47.070006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
I1212 22:48:50.449055 140003118698624 session_manager.py:498] Running local_init_op.
I1212 22:48:50.493362 140003118698624 session_manager.py:500] Done running local_init_op.
Running warm up
2018-12-12 22:48:52.703752: I tensorflow/core/kernels/conv_grad_input_ops.cc:1023] running auto-tune for Backward-Data
...
2018-12-12 22:48:56.569004: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
2018-12-12 22:48:56.605724: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
Done warm up
Step Img/sec total_loss
1 images/sec: 102.6 +/- 0.0 (jitter = 0.0) 7.282
10 images/sec: 101.9 +/- 0.3 (jitter = 0.4) 7.322
20 images/sec: 101.9 +/- 0.2 (jitter = 0.6) 7.317
30 images/sec: 101.9 +/- 0.1 (jitter = 0.7) 7.401
40 images/sec: 101.9 +/- 0.1 (jitter = 0.6) 7.298
50 images/sec: 101.8 +/- 0.1 (jitter = 0.6) 7.275
60 images/sec: 101.8 +/- 0.1 (jitter = 0.6) 7.366
70 images/sec: 101.8 +/- 0.1 (jitter = 0.6) 7.363
80 images/sec: 101.7 +/- 0.1 (jitter = 0.6) 7.403
90 images/sec: 101.7 +/- 0.1 (jitter = 0.5) 7.330
100 images/sec: 101.7 +/- 0.1 (jitter = 0.5) 7.369
----------------------------------------------------------------
total images/sec: 101.65
----------------------------------------------------------------
$ python tf_cnn_benchmarks.py --num_gpus=1 --batch_size=64 --model=vgg16 --variable_update=parameter_server
WARNING: Logging before flag parsing goes to stderr.
W1212 22:46:25.216527 140055928705152 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/distribution.py:265: ReparameterizationType.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
W1212 22:46:25.270726 140055928705152 deprecation.py:305] From /home/miku/venv/lib/python3.6/site-packages/tensorflow/python/ops/distributions/bernoulli.py:169: RegisterKL.__init__ (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
2018-12-12 22:46:27.691370: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-12-12 22:46:27.691665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 0 with properties:
name: Vega [Radeon RX Vega]
AMDGPU ISA: gfx900
memoryClockRate (GHz) 1.663
pciBusID 0000:0c:00.0
Total memory: 7.98GiB
Free memory: 7.73GiB
2018-12-12 22:46:27.691731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1530] Found device 1 with properties:
name: Ellesmere [Radeon RX 470/480]
AMDGPU ISA: gfx803
memoryClockRate (GHz) 1.38
pciBusID 0000:42:00.0
Total memory: 4.00GiB
Free memory: 3.75GiB
2018-12-12 22:46:27.691757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
2018-12-12 22:46:27.691772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 22:46:27.691777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
2018-12-12 22:46:27.691782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
2018-12-12 22:46:27.691786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
2018-12-12 22:46:27.691818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
2018-12-12 22:46:27.710820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
TensorFlow: 1.12
Model: vgg16
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 64 global
64 per device
Num batches: 100
Num epochs: 0.00
Devices: ['/gpu:0']
NUMA bind: False
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
==========
Generating training model
Initializing graph
W1212 22:46:28.166204 140055928705152 deprecation.py:305] From /home/miku/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2262: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-12-12 22:46:28.245635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Adding visible gpu devices: 0, 1
2018-12-12 22:46:28.245706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-12 22:46:28.245713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1057] 0 1
2018-12-12 22:46:28.245721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 0: N N
2018-12-12 22:46:28.245726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1070] 1: N N
2018-12-12 22:46:28.245749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7524 MB memory) -> physical GPU (device: 0, name: Vega [Radeon RX Vega], pci bus id: 0000:0c:00.0)
2018-12-12 22:46:28.245962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1189] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 3540 MB memory) -> physical GPU (device: 1, name: Ellesmere [Radeon RX 470/480], pci bus id: 0000:42:00.0)
I1212 22:46:31.094600 140055928705152 session_manager.py:498] Running local_init_op.
I1212 22:46:31.138288 140055928705152 session_manager.py:500] Done running local_init_op.
Running warm up
2018-12-12 22:46:31.715197: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
2018-12-12 22:46:31.930183: I tensorflow/core/kernels/conv_grad_input_ops.cc:1023] running auto-tune for Backward-Data
...
2018-12-12 22:46:34.526611: I tensorflow/core/kernels/conv_grad_filter_ops.cc:975] running auto-tune for Backward-Filter
Done warm up
Step Img/sec total_loss
1 images/sec: 109.5 +/- 0.0 (jitter = 0.0) 7.239
10 images/sec: 109.4 +/- 0.1 (jitter = 0.2) 7.279
20 images/sec: 109.4 +/- 0.1 (jitter = 0.2) 7.283
30 images/sec: 109.2 +/- 0.1 (jitter = 0.3) 7.278
40 images/sec: 109.1 +/- 0.1 (jitter = 0.4) 7.286
50 images/sec: 109.1 +/- 0.1 (jitter = 0.4) 7.272
60 images/sec: 109.0 +/- 0.1 (jitter = 0.5) 7.286
70 images/sec: 109.0 +/- 0.1 (jitter = 0.5) 7.258
80 images/sec: 109.0 +/- 0.1 (jitter = 0.5) 7.267
90 images/sec: 109.0 +/- 0.0 (jitter = 0.5) 7.267
100 images/sec: 109.0 +/- 0.0 (jitter = 0.4) 7.260
----------------------------------------------------------------
total images/sec: 108.95
----------------------------------------------------------------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment