Skip to content

Instantly share code, notes, and snippets.

@byronyi
Created June 27, 2017 19:33
Show Gist options
  • Save byronyi/52496c089b98bbe254f899742cf9bf93 to your computer and use it in GitHub Desktop.
Save byronyi/52496c089b98bbe254f899742cf9bf93 to your computer and use it in GitHub Desktop.
2017-06-28 03:31:29.957231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:02:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2017-06-28 03:31:30.169520: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x5569248497c0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-28 03:31:30.171168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 1 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2017-06-28 03:31:30.386221: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x55692484d600 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-28 03:31:30.387893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 2 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:82:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2017-06-28 03:31:30.613924: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x556924851470 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-28 03:31:30.615614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 3 with properties:
name: Tesla K40m
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:83:00.0
Total memory: 11.17GiB
Free memory: 11.10GiB
2017-06-28 03:31:30.616394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 0 and 2
2017-06-28 03:31:30.616417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 0 and 3
2017-06-28 03:31:30.616450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 1 and 2
2017-06-28 03:31:30.616464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 1 and 3
2017-06-28 03:31:30.616477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 2 and 0
2017-06-28 03:31:30.616490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 2 and 1
2017-06-28 03:31:30.617052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 3 and 0
2017-06-28 03:31:30.617071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 3 and 1
2017-06-28 03:31:30.617167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 1 2 3
2017-06-28 03:31:30.617177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y Y N N
2017-06-28 03:31:30.617185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 1: Y Y N N
2017-06-28 03:31:30.617191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 2: N N Y Y
2017-06-28 03:31:30.617198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 3: N N Y Y
2017-06-28 03:31:30.617210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0)
2017-06-28 03:31:30.617217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40m, pci bus id: 0000:03:00.0)
2017-06-28 03:31:30.617224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K40m, pci bus id: 0000:82:00.0)
2017-06-28 03:31:30.617230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K40m, pci bus id: 0000:83:00.0)
2017-06-28 03:31:31.540680: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 10.40.2.200:5000, 1 -> 10.40.2.201:5000}
2017-06-28 03:31:31.540785: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:5001, 1 -> 10.40.2.201:5001}
2017-06-28 03:31:31.541821: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 10.40.2.200:5000, 1 -> 10.40.2.201:5000}
2017-06-28 03:31:31.541855: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:5001, 1 -> 10.40.2.201:5001}
2017-06-28 03:31:31.543878: I tensorflow/contrib/verbs/rdma.cc:99] Start RdmaAdapter: mlx4_0
2017-06-28 03:31:31.547130: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:5001
2017-06-28 03:31:31.547216: I tensorflow/contrib/verbs/rdma_mgr.cc:56] connecting to remote node /job:worker/replica:0/task:1
2017-06-28 03:31:36.171610: I tensorflow/contrib/verbs/rdma.cc:523] channel already connected
2017-06-28 03:31:36.171686: I tensorflow/contrib/verbs/rdma_mgr.cc:56] connecting to remote node /job:ps/replica:0/task:1
2017-06-28 03:31:36.172594: I tensorflow/contrib/verbs/rdma.cc:523] channel already connected
2017-06-28 03:31:36.172619: I tensorflow/contrib/verbs/rdma_mgr.cc:56] connecting to remote node /job:ps/replica:0/task:0
2017-06-28 03:31:36.173297: I tensorflow/contrib/verbs/rdma.cc:523] channel already connected
2017-06-28 03:31:37.381752: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:37.381816: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.381923: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:37.381946: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.381963: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:37.382037: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.382103: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:37.382136: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.382181: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:37.382195: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.975572: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:37.975844: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.975862: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:37.975878: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:37.976058: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:37.976315: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:37.976335: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:38.447760: I tensorflow/core/distributed_runtime/master_session.cc:995] Start master session 091f9f6dccba154a with config:
intra_op_parallelism_threads: 1
gpu_options {
force_gpu_compatible: true
}
allow_soft_placement: true
2017-06-28 03:31:38.505346: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:38.505386: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:38.505497: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:38.505514: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:38.996041: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:38.996425: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:38.996484: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:38.996508: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:38.996659: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:38.996944: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:38.997027: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.019796: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.020098: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.020113: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.020126: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.020234: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.020340: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.020361: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.021822: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.022073: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.022093: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.022110: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.022289: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.022437: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.022453: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.231171: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.231503: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.231533: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.231554: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.231667: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.232078: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.232160: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.500205: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.500512: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.500535: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.500561: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.500636: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.500953: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.501022: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.517940: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:39.517972: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.518010: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:39.518025: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.518207: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:39.518249: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.518285: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_18_report_uninitialized_variables/boolean_mask/concat;0:0;92681776328865098
2017-06-28 03:31:39.518676: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.518711: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.518767: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.518797: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_18_report_uninitialized_variables/boolean_mask/concat;0:0;92681776328865098
2017-06-28 03:31:39.518809: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.518868: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.518909: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.518931: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.519997: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:39.520036: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_98_report_uninitialized_variables/LogicalNot;0:0;92681776328865098
2017-06-28 03:31:39.520037: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.520205: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:39.520236: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.520240: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;92681776328865098
2017-06-28 03:31:39.522054: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.522242: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.522261: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.522277: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.522455: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.522699: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.522718: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.782965: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.783182: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.783223: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.783248: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.783280: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_98_report_uninitialized_variables/LogicalNot;0:0;92681776328865098
2017-06-28 03:31:39.783427: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.783543: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.783564: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.783631: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.783846: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.783858: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.783929: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.783975: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.784100: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.784170: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.784353: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:39.784436: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.784504: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:39.784524: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:39.784532: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;92681776328865098
2017-06-28 03:31:39.784671: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:39.784744: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:39.784763: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:42.890552: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:42.890842: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:42.890863: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:42.890877: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:42.890945: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:42.891232: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:42.891268: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.036788: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.036821: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.036839: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.036899: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.036924: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.036929: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;88118325191660910
2017-06-28 03:31:45.037074: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037096: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.037243: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.037268: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037273: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_98_report_uninitialized_variables/LogicalNot;0:0;88118325191660910
2017-06-28 03:31:45.037283: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037532: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.037552: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037558: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_18_report_uninitialized_variables/boolean_mask/concat;0:0;88118325191660910
2017-06-28 03:31:45.037632: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.037655: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.037690: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037703: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.037717: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:2;3489199e002e563e;/job:ps/replica:0/task:0/cpu:0;edge_95_report_uninitialized_variables/IsVariableInitialized_37;0:0;88118325191660910
2017-06-28 03:31:45.037720: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037798: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.037821: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.037833: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:1;3bdf2083ed6f6884;/job:ps/replica:0/task:0/cpu:0;edge_94_report_uninitialized_variables/IsVariableInitialized_36;0:0;88118325191660910
2017-06-28 03:31:45.037690: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:3;7766f650b9fdd2f5;/job:ps/replica:0/task:0/cpu:0;edge_96_report_uninitialized_variables/IsVariableInitialized_38;0:0;88118325191660910
2017-06-28 03:31:45.038054: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.038086: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038092: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_93_report_uninitialized_variables/IsVariableInitialized_35;0:0;88118325191660910
2017-06-28 03:31:45.038094: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:45.038160: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.038177: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_92_report_uninitialized_variables/IsVariableInitialized_34;0:0;88118325191660910
2017-06-28 03:31:45.038177: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038276: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038314: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:45.038528: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST
2017-06-28 03:31:45.038557: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038564: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_91_report_uninitialized_variables/IsVariableInitialized_33;0:0;88118325191660910
2017-06-28 03:31:45.038572: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038580: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:45.038752: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.038776: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038788: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038795: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:45.038949: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:45.038981: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038990: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.038999: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:45.039048: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:1;3bdf2083ed6f6884;/job:ps/replica:0/task:0/cpu:0;edge_94_report_uninitialized_variables/IsVariableInitialized_36;0:0;88118325191660910
2017-06-28 03:31:45.039132: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:45.039153: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039163: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.039174: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039180: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST
2017-06-28 03:31:45.039231: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_93_report_uninitialized_variables/IsVariableInitialized_35;0:0;88118325191660910
2017-06-28 03:31:45.039276: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:45.039295: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:3;7766f650b9fdd2f5;/job:ps/replica:0/task:0/cpu:0;edge_96_report_uninitialized_variables/IsVariableInitialized_38;0:0;88118325191660910
2017-06-28 03:31:45.039296: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039356: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.039367: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039373: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.039491: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:45.039513: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039577: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_92_report_uninitialized_variables/IsVariableInitialized_34;0:0;88118325191660910
2017-06-28 03:31:45.039712: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:45.039733: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.039744: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039756: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:2;3489199e002e563e;/job:ps/replica:0/task:0/cpu:0;edge_95_report_uninitialized_variables/IsVariableInitialized_37;0:0;88118325191660910
2017-06-28 03:31:45.039757: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.039823: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.039834: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.039980: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE
2017-06-28 03:31:45.040004: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.040025: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_91_report_uninitialized_variables/IsVariableInitialized_33;0:0;88118325191660910
2017-06-28 03:31:45.040051: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.040075: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.040085: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.040359: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.040385: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.040398: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.040410: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.040423: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.040663: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.040852: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE
2017-06-28 03:31:45.041006: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.041030: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.041041: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:45.041108: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE
2017-06-28 03:31:45.041129: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK
2017-06-28 03:31:46.091305: F tensorflow/contrib/verbs/rdma.cc:683] Check failed: status.ok() RecvLocalAsync was not ok, key/job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;88118325191660910 error message: Step 88118325191660910
TensorFlow: 1.2
Model: vgg16
Mode: training
Batch size: 256 global
64 per device
Devices: ['/job:worker/task:0/gpu:0', '/job:worker/task:0/gpu:1', '/job:worker/task:0/gpu:2', '/job:worker/task:0/gpu:3']
Data format: NCHW
Optimizer: sgd
Variables: parameter_server
Sync: True
==========
Generating model
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment