Created
June 27, 2017 19:33
-
-
Save byronyi/52496c089b98bbe254f899742cf9bf93 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2017-06-28 03:31:29.957231: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties: | |
name: Tesla K40m | |
major: 3 minor: 5 memoryClockRate (GHz) 0.745 | |
pciBusID 0000:02:00.0 | |
Total memory: 11.17GiB | |
Free memory: 11.10GiB | |
2017-06-28 03:31:30.169520: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x5569248497c0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. | |
2017-06-28 03:31:30.171168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 1 with properties: | |
name: Tesla K40m | |
major: 3 minor: 5 memoryClockRate (GHz) 0.745 | |
pciBusID 0000:03:00.0 | |
Total memory: 11.17GiB | |
Free memory: 11.10GiB | |
2017-06-28 03:31:30.386221: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x55692484d600 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. | |
2017-06-28 03:31:30.387893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 2 with properties: | |
name: Tesla K40m | |
major: 3 minor: 5 memoryClockRate (GHz) 0.745 | |
pciBusID 0000:82:00.0 | |
Total memory: 11.17GiB | |
Free memory: 11.10GiB | |
2017-06-28 03:31:30.613924: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x556924851470 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that. | |
2017-06-28 03:31:30.615614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 3 with properties: | |
name: Tesla K40m | |
major: 3 minor: 5 memoryClockRate (GHz) 0.745 | |
pciBusID 0000:83:00.0 | |
Total memory: 11.17GiB | |
Free memory: 11.10GiB | |
2017-06-28 03:31:30.616394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 0 and 2 | |
2017-06-28 03:31:30.616417: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 0 and 3 | |
2017-06-28 03:31:30.616450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 1 and 2 | |
2017-06-28 03:31:30.616464: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 1 and 3 | |
2017-06-28 03:31:30.616477: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 2 and 0 | |
2017-06-28 03:31:30.616490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 2 and 1 | |
2017-06-28 03:31:30.617052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 3 and 0 | |
2017-06-28 03:31:30.617071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 3 and 1 | |
2017-06-28 03:31:30.617167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 1 2 3 | |
2017-06-28 03:31:30.617177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0: Y Y N N | |
2017-06-28 03:31:30.617185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 1: Y Y N N | |
2017-06-28 03:31:30.617191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 2: N N Y Y | |
2017-06-28 03:31:30.617198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 3: N N Y Y | |
2017-06-28 03:31:30.617210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40m, pci bus id: 0000:02:00.0) | |
2017-06-28 03:31:30.617217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40m, pci bus id: 0000:03:00.0) | |
2017-06-28 03:31:30.617224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K40m, pci bus id: 0000:82:00.0) | |
2017-06-28 03:31:30.617230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K40m, pci bus id: 0000:83:00.0) | |
2017-06-28 03:31:31.540680: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 10.40.2.200:5000, 1 -> 10.40.2.201:5000} | |
2017-06-28 03:31:31.540785: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:5001, 1 -> 10.40.2.201:5001} | |
2017-06-28 03:31:31.541821: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job ps -> {0 -> 10.40.2.200:5000, 1 -> 10.40.2.201:5000} | |
2017-06-28 03:31:31.541855: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job worker -> {0 -> localhost:5001, 1 -> 10.40.2.201:5001} | |
2017-06-28 03:31:31.543878: I tensorflow/contrib/verbs/rdma.cc:99] Start RdmaAdapter: mlx4_0 | |
2017-06-28 03:31:31.547130: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:316] Started server with target: grpc://localhost:5001 | |
2017-06-28 03:31:31.547216: I tensorflow/contrib/verbs/rdma_mgr.cc:56] connecting to remote node /job:worker/replica:0/task:1 | |
2017-06-28 03:31:36.171610: I tensorflow/contrib/verbs/rdma.cc:523] channel already connected | |
2017-06-28 03:31:36.171686: I tensorflow/contrib/verbs/rdma_mgr.cc:56] connecting to remote node /job:ps/replica:0/task:1 | |
2017-06-28 03:31:36.172594: I tensorflow/contrib/verbs/rdma.cc:523] channel already connected | |
2017-06-28 03:31:36.172619: I tensorflow/contrib/verbs/rdma_mgr.cc:56] connecting to remote node /job:ps/replica:0/task:0 | |
2017-06-28 03:31:36.173297: I tensorflow/contrib/verbs/rdma.cc:523] channel already connected | |
2017-06-28 03:31:37.381752: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:37.381816: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.381923: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:37.381946: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.381963: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:37.382037: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.382103: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:37.382136: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.382181: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:37.382195: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.975572: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:37.975844: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.975862: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:37.975878: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:37.976058: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:37.976315: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:37.976335: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:38.447760: I tensorflow/core/distributed_runtime/master_session.cc:995] Start master session 091f9f6dccba154a with config: | |
intra_op_parallelism_threads: 1 | |
gpu_options { | |
force_gpu_compatible: true | |
} | |
allow_soft_placement: true | |
2017-06-28 03:31:38.505346: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:38.505386: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:38.505497: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:38.505514: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:38.996041: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:38.996425: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:38.996484: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:38.996508: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:38.996659: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:38.996944: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:38.997027: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.019796: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.020098: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.020113: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.020126: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.020234: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.020340: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.020361: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.021822: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.022073: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.022093: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.022110: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.022289: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.022437: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.022453: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.231171: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.231503: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.231533: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.231554: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.231667: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.232078: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.232160: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.500205: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.500512: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.500535: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.500561: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.500636: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.500953: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.501022: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.517940: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:39.517972: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.518010: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:39.518025: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.518207: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:39.518249: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.518285: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_18_report_uninitialized_variables/boolean_mask/concat;0:0;92681776328865098 | |
2017-06-28 03:31:39.518676: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.518711: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.518767: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.518797: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_18_report_uninitialized_variables/boolean_mask/concat;0:0;92681776328865098 | |
2017-06-28 03:31:39.518809: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.518868: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.518909: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.518931: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.519997: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:39.520036: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_98_report_uninitialized_variables/LogicalNot;0:0;92681776328865098 | |
2017-06-28 03:31:39.520037: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.520205: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:39.520236: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.520240: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;92681776328865098 | |
2017-06-28 03:31:39.522054: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.522242: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.522261: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.522277: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.522455: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.522699: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.522718: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.782965: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.783182: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.783223: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.783248: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.783280: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_98_report_uninitialized_variables/LogicalNot;0:0;92681776328865098 | |
2017-06-28 03:31:39.783427: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.783543: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.783564: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.783631: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.783846: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.783858: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.783929: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.783975: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.784100: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.784170: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.784353: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:39.784436: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.784504: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:39.784524: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:39.784532: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;92681776328865098 | |
2017-06-28 03:31:39.784671: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:39.784744: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:39.784763: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:42.890552: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:42.890842: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:42.890863: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:42.890877: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:42.890945: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:42.891232: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:42.891268: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.036788: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.036821: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.036839: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.036899: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.036924: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.036929: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;88118325191660910 | |
2017-06-28 03:31:45.037074: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037096: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.037243: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.037268: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037273: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_98_report_uninitialized_variables/LogicalNot;0:0;88118325191660910 | |
2017-06-28 03:31:45.037283: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037532: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.037552: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037558: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_18_report_uninitialized_variables/boolean_mask/concat;0:0;88118325191660910 | |
2017-06-28 03:31:45.037632: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.037655: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.037690: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037703: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.037717: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:2;3489199e002e563e;/job:ps/replica:0/task:0/cpu:0;edge_95_report_uninitialized_variables/IsVariableInitialized_37;0:0;88118325191660910 | |
2017-06-28 03:31:45.037720: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037798: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.037821: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.037833: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:1;3bdf2083ed6f6884;/job:ps/replica:0/task:0/cpu:0;edge_94_report_uninitialized_variables/IsVariableInitialized_36;0:0;88118325191660910 | |
2017-06-28 03:31:45.037690: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:3;7766f650b9fdd2f5;/job:ps/replica:0/task:0/cpu:0;edge_96_report_uninitialized_variables/IsVariableInitialized_38;0:0;88118325191660910 | |
2017-06-28 03:31:45.038054: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.038086: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038092: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_93_report_uninitialized_variables/IsVariableInitialized_35;0:0;88118325191660910 | |
2017-06-28 03:31:45.038094: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:45.038160: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.038177: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_92_report_uninitialized_variables/IsVariableInitialized_34;0:0;88118325191660910 | |
2017-06-28 03:31:45.038177: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038276: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038314: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:45.038528: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_REQUEST | |
2017-06-28 03:31:45.038557: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038564: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_91_report_uninitialized_variables/IsVariableInitialized_33;0:0;88118325191660910 | |
2017-06-28 03:31:45.038572: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038580: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:45.038752: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.038776: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038788: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038795: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:45.038949: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:45.038981: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038990: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.038999: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:45.039048: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:1;3bdf2083ed6f6884;/job:ps/replica:0/task:0/cpu:0;edge_94_report_uninitialized_variables/IsVariableInitialized_36;0:0;88118325191660910 | |
2017-06-28 03:31:45.039132: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:45.039153: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039163: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.039174: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039180: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_REQUEST | |
2017-06-28 03:31:45.039231: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_93_report_uninitialized_variables/IsVariableInitialized_35;0:0;88118325191660910 | |
2017-06-28 03:31:45.039276: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:45.039295: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:3;7766f650b9fdd2f5;/job:ps/replica:0/task:0/cpu:0;edge_96_report_uninitialized_variables/IsVariableInitialized_38;0:0;88118325191660910 | |
2017-06-28 03:31:45.039296: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039356: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.039367: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039373: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.039491: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:45.039513: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039577: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_92_report_uninitialized_variables/IsVariableInitialized_34;0:0;88118325191660910 | |
2017-06-28 03:31:45.039712: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:45.039733: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.039744: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039756: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/gpu:2;3489199e002e563e;/job:ps/replica:0/task:0/cpu:0;edge_95_report_uninitialized_variables/IsVariableInitialized_37;0:0;88118325191660910 | |
2017-06-28 03:31:45.039757: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.039823: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.039834: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.039980: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_RESPONSE | |
2017-06-28 03:31:45.040004: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.040025: I tensorflow/contrib/verbs/rdma.cc:671] try to send tensor: /job:worker/replica:0/task:0/cpu:0;6997ae70545c2c49;/job:ps/replica:0/task:0/cpu:0;edge_91_report_uninitialized_variables/IsVariableInitialized_33;0:0;88118325191660910 | |
2017-06-28 03:31:45.040051: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.040075: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.040085: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.040359: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.040385: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.040398: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.040410: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.040423: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.040663: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.040852: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_TENSOR_WRITE | |
2017-06-28 03:31:45.041006: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.041030: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.041041: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:45.041108: I tensorflow/contrib/verbs/rdma.cc:143] recv RDMA message: RDMA_MESSAGE_BUFFER_IDLE | |
2017-06-28 03:31:45.041129: I tensorflow/contrib/verbs/rdma.cc:225] sent RDMA message: RDMA_MESSAGE_ACK | |
2017-06-28 03:31:46.091305: F tensorflow/contrib/verbs/rdma.cc:683] Check failed: status.ok() RecvLocalAsync was not ok, key/job:worker/replica:0/task:0/gpu:0;32e08a6a58e99259;/job:ps/replica:0/task:0/cpu:0;edge_103_report_uninitialized_variables/boolean_mask/Squeeze;0:0;88118325191660910 error message: Step 88118325191660910 | |
TensorFlow: 1.2 | |
Model: vgg16 | |
Mode: training | |
Batch size: 256 global | |
64 per device | |
Devices: ['/job:worker/task:0/gpu:0', '/job:worker/task:0/gpu:1', '/job:worker/task:0/gpu:2', '/job:worker/task:0/gpu:3'] | |
Data format: NCHW | |
Optimizer: sgd | |
Variables: parameter_server | |
Sync: True | |
========== | |
Generating model |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment