Created
June 24, 2022 01:11
-
-
Save crazyboycjr/8f6dd053b5c2d2fbff2806fca72d2aa2 to your computer and use it in GitHub Desktop.
Alpa runlog, invalid memory access
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % python3 tests/test_install.py | |
.compile_pipeshard_executable::trace: 0.97 s | |
compile_pipeshard_executable::jaxpr operations: 0.00 s | |
compile_pipeshard_executable::stage construction: 0.00 s | |
compile_pipeshard_executable::apply grad: 0.00 s | |
compile_pipeshard_executable::shard stages: 1.69 s | |
compile_pipeshard_executable::launch meshes: 0.72 s | |
compile_pipeshard_executable::driver executable: 29.27 s | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 8, Info: allocate zero for recv | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 5, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 6, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 7, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 9, Info: allocate zero for recv | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 10, Info: allocate zero for recv | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 8, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 1, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 3, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 2, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 4, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 17, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 19, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 18, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 0 | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 11, Info: allocate zero for recv | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 20, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 14, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 16, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 13, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 15, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 5, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 4, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 2, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 1, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 6, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 8, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 3, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 12, Info: allocate zero for recv | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 7, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 1 | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 2 | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 16, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 13, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5352, ip=172.31.46.146) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5 | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 14, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5714, ip=172.31.33.12) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5 | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: SEND, Task uuid: 15, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=13998) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5 | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5957, ip=172.31.43.221) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 7, Info: stage 5 | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 20, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 18, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 17, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 13, Info: allocate zero for recv | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RECV, Task uuid: 19, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 3 | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6375, ip=172.31.43.70) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4 | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6139, ip=172.31.45.241) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4 | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6134, ip=172.31.39.47) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4 | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=5563, ip=172.31.39.172) memory_allocated: -0.000 GB max_memory_allocated: -0.000 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 4 | |
. | |
---------------------------------------------------------------------- | |
Ran 2 tests in 79.713s | |
OK | |
python3 tests/test_install.py 29.20s user 31.10s system 71% cpu 1:24.01 total | |
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % ls | |
GENVER LICENSE README.md VERSION alpa alpa.egg-info benchmark build build_jaxlib compute-cost-2022-06-22-23-22-30.npy docker docs examples format.sh playground setup.py tests third_party | |
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % | |
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % vim alpa/global_env.py | |
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % ls | |
GENVER LICENSE README.md VERSION alpa alpa.egg-info benchmark build build_jaxlib compute-cost-2022-06-22-23-22-30.npy docker docs examples format.sh playground setup.py tests third_party | |
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % | |
(alpa-torch) cjr@ip-172-31-33-99 ~/nfs/alpa-torch-software/alpa (git)-[main] % python3 tests/minimal_reproduce.py | |
2022-06-24 00:32:44.112 INFO worker - init: Connecting to existing Ray cluster at address: 172.31.33.99:6379 | |
2022-06-24 00:32:44.864 INFO xla_bridge - backends: Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: | |
2022-06-24 00:32:49.011 INFO xla_bridge - backends: Unable to initialize backend 'tpu': INVALID_ARGUMENT: TpuPlatform is not available. | |
/home/cjr/miniconda3/envs/alpa-torch/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py:1866: UserWarning: Explicitly requested dtype <class 'numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more. | |
lax_internal._check_user_dtype_supported(dtype, "asarray") | |
/home/cjr/miniconda3/envs/alpa-torch/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py:1866: UserWarning: Explicitly requested dtype <class 'numpy.float64'> requested in asarray is not available, and will be truncated to dtype float32. To enable more dtypes, set the jax_enable_x64 configuration option or the JAX_ENABLE_X64 shell environment variable. See https://github.com/google/jax#current-gotchas for more. | |
lax_internal._check_user_dtype_supported(dtype, "asarray") | |
compile_pipeshard_executable::trace: 4.47 s | |
compile_pipeshard_executable::jaxpr operations: 0.18 s | |
compile_pipeshard_executable::stage construction: 0.00 s | |
compile_pipeshard_executable::apply grad: 0.04 s | |
compile_pipeshard_executable::shard stages: 2.49 s | |
compile_pipeshard_executable::launch meshes: 0.73 s | |
compile_pipeshard_executable::driver executable: 332.60 s | |
compile_pipeshard_executable::jaxpr operations: 0.04 s | |
compile_pipeshard_executable::stage construction: 0.00 s | |
compile_pipeshard_executable::apply grad: 0.00 s | |
compile_pipeshard_executable::shard stages: 12.07 s | |
compile_pipeshard_executable::launch meshes: 0.00 s | |
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 812, Info: stage 7 | |
(MeshHostWorker pid=16300) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 810, Info: stage 5 | |
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 811, Info: stage 6 | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 0.000 GB max_memory_allocated: 1.040 GB next instruction: Opcode: RUN, Task uuid: 808, Info: stage 3 | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 0.000 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 809, Info: stage 4 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 0.000 GB max_memory_allocated: 1.821 GB next instruction: Opcode: RUN, Task uuid: 805, Info: stage 0 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 0.000 GB max_memory_allocated: 0.458 GB next instruction: Opcode: RUN, Task uuid: 806, Info: stage 1 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 0.000 GB max_memory_allocated: 0.631 GB next instruction: Opcode: RUN, Task uuid: 807, Info: stage 2 | |
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.053 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 31, Info: allocate zero for recv | |
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.106 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 43, Info: allocate zero for recv | |
(MeshHostWorker pid=6548, ip=172.31.33.12) memory_allocated: 0.118 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RECV, Task uuid: 359, Info: | |
(MeshHostWorker pid=16300) memory_allocated: 0.053 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 30, Info: allocate zero for recv | |
(MeshHostWorker pid=16300) memory_allocated: 0.106 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 39, Info: allocate zero for recv | |
(MeshHostWorker pid=16300) memory_allocated: 0.118 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RECV, Task uuid: 353, Info: | |
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.843 GB max_memory_allocated: 0.843 GB next instruction: Opcode: RUN, Task uuid: 32, Info: allocate zero for recv | |
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.904 GB max_memory_allocated: 0.904 GB next instruction: Opcode: RUN, Task uuid: 48, Info: allocate zero for recv | |
(MeshHostWorker pid=6256, ip=172.31.46.146) memory_allocated: 0.916 GB max_memory_allocated: 0.916 GB next instruction: Opcode: RECV, Task uuid: 365, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.188 GB max_memory_allocated: 3.188 GB next instruction: Opcode: RUN, Task uuid: 29, Info: allocate zero for recv | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.251 GB max_memory_allocated: 3.251 GB next instruction: Opcode: RUN, Task uuid: 36, Info: allocate zero for recv | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.261 GB next instruction: Opcode: RECV, Task uuid: 335, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 3.907 GB max_memory_allocated: 3.907 GB next instruction: Opcode: RUN, Task uuid: 28, Info: allocate zero for recv | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 3.908 GB max_memory_allocated: 3.908 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3 | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.104 GB max_memory_allocated: 4.104 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.055 GB max_memory_allocated: 4.104 GB next instruction: Opcode: SEND, Task uuid: 334, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.055 GB max_memory_allocated: 4.104 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.055 GB max_memory_allocated: 4.104 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3 | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.250 GB max_memory_allocated: 4.250 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.202 GB max_memory_allocated: 4.250 GB next instruction: Opcode: SEND, Task uuid: 334, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.201 GB max_memory_allocated: 4.250 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.201 GB max_memory_allocated: 4.250 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3 | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.261 GB next instruction: Opcode: RECV, Task uuid: 113, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.397 GB max_memory_allocated: 4.397 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.348 GB max_memory_allocated: 4.397 GB next instruction: Opcode: SEND, Task uuid: 334, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.348 GB max_memory_allocated: 4.397 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.348 GB max_memory_allocated: 4.397 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3 | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.543 GB max_memory_allocated: 4.543 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.495 GB max_memory_allocated: 4.543 GB next instruction: Opcode: SEND, Task uuid: 334, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.494 GB max_memory_allocated: 4.543 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.494 GB max_memory_allocated: 4.543 GB next instruction: Opcode: RUN, Task uuid: 4, Info: stage 3 | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.690 GB max_memory_allocated: 4.690 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.641 GB max_memory_allocated: 4.690 GB next instruction: Opcode: RUN, Task uuid: 57, Info: allocate zero for recv | |
(MeshHostWorker pid=6353, ip=172.31.39.172) memory_allocated: 4.913 GB max_memory_allocated: 4.913 GB next instruction: Opcode: RECV, Task uuid: 347, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.356 GB max_memory_allocated: 3.356 GB next instruction: Opcode: RUN, Task uuid: 25, Info: allocate zero for recv | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.453 GB max_memory_allocated: 3.453 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.128 GB max_memory_allocated: 3.128 GB next instruction: Opcode: RUN, Task uuid: 27, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.128 GB max_memory_allocated: 3.128 GB next instruction: Opcode: RUN, Task uuid: 33, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.130 GB max_memory_allocated: 3.130 GB next instruction: Opcode: RECV, Task uuid: 116, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.587 GB max_memory_allocated: 3.587 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.538 GB max_memory_allocated: 3.587 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.671 GB max_memory_allocated: 3.671 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.622 GB max_memory_allocated: 3.671 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.755 GB max_memory_allocated: 3.755 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.706 GB max_memory_allocated: 3.755 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.840 GB max_memory_allocated: 3.840 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.791 GB max_memory_allocated: 3.840 GB next instruction: Opcode: SEND, Task uuid: 112, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.791 GB max_memory_allocated: 3.840 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.791 GB max_memory_allocated: 3.840 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.261 GB next instruction: Opcode: RECV, Task uuid: 221, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.924 GB max_memory_allocated: 3.924 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.875 GB max_memory_allocated: 3.924 GB next instruction: Opcode: SEND, Task uuid: 112, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.873 GB max_memory_allocated: 3.924 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.873 GB max_memory_allocated: 3.924 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.006 GB max_memory_allocated: 4.006 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.957 GB max_memory_allocated: 4.006 GB next instruction: Opcode: SEND, Task uuid: 112, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.955 GB max_memory_allocated: 4.006 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 3.955 GB max_memory_allocated: 4.006 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.088 GB max_memory_allocated: 4.088 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.039 GB max_memory_allocated: 4.088 GB next instruction: Opcode: SEND, Task uuid: 112, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.037 GB max_memory_allocated: 4.088 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.037 GB max_memory_allocated: 4.088 GB next instruction: Opcode: RUN, Task uuid: 1, Info: stage 0 | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.170 GB max_memory_allocated: 4.170 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.121 GB max_memory_allocated: 4.170 GB next instruction: Opcode: SEND, Task uuid: 112, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.119 GB max_memory_allocated: 4.170 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7047, ip=172.31.45.241) memory_allocated: 4.119 GB max_memory_allocated: 4.170 GB next instruction: Opcode: SEND, Task uuid: 1, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.001 GB max_memory_allocated: 25.001 GB next instruction: Opcode: RUN, Task uuid: 26, Info: allocate zero for recv | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.003 GB max_memory_allocated: 25.003 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.418 GB max_memory_allocated: 25.418 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.027 GB max_memory_allocated: 25.418 GB next instruction: Opcode: SEND, Task uuid: 115, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.130 GB max_memory_allocated: 3.130 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.027 GB max_memory_allocated: 25.418 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.027 GB max_memory_allocated: 25.418 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.443 GB max_memory_allocated: 25.443 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.052 GB max_memory_allocated: 25.443 GB next instruction: Opcode: SEND, Task uuid: 115, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.049 GB max_memory_allocated: 25.443 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.049 GB max_memory_allocated: 25.443 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.465 GB max_memory_allocated: 25.465 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.074 GB max_memory_allocated: 25.465 GB next instruction: Opcode: SEND, Task uuid: 115, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.072 GB max_memory_allocated: 25.465 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.072 GB max_memory_allocated: 25.465 GB next instruction: Opcode: SEND, Task uuid: 220, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.069 GB max_memory_allocated: 25.465 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.069 GB max_memory_allocated: 25.465 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.484 GB max_memory_allocated: 25.484 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.094 GB max_memory_allocated: 25.484 GB next instruction: Opcode: SEND, Task uuid: 115, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.091 GB max_memory_allocated: 25.484 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.091 GB max_memory_allocated: 25.484 GB next instruction: Opcode: SEND, Task uuid: 220, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.262 GB next instruction: Opcode: RECV, Task uuid: 296, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.089 GB max_memory_allocated: 25.484 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.089 GB max_memory_allocated: 25.484 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.504 GB max_memory_allocated: 25.504 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.114 GB max_memory_allocated: 25.504 GB next instruction: Opcode: SEND, Task uuid: 115, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.111 GB max_memory_allocated: 25.504 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.111 GB max_memory_allocated: 25.504 GB next instruction: Opcode: SEND, Task uuid: 220, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.109 GB max_memory_allocated: 25.504 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.109 GB max_memory_allocated: 25.504 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.524 GB max_memory_allocated: 25.524 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.133 GB max_memory_allocated: 25.524 GB next instruction: Opcode: SEND, Task uuid: 115, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.295 GB max_memory_allocated: 3.295 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.244 GB max_memory_allocated: 3.295 GB next instruction: Opcode: RUN, Task uuid: 34, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.246 GB max_memory_allocated: 3.295 GB next instruction: Opcode: RECV, Task uuid: 116, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.246 GB max_memory_allocated: 3.295 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.411 GB max_memory_allocated: 3.411 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.360 GB max_memory_allocated: 3.411 GB next instruction: Opcode: RUN, Task uuid: 35, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: RECV, Task uuid: 116, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.131 GB max_memory_allocated: 25.524 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.131 GB max_memory_allocated: 25.524 GB next instruction: Opcode: SEND, Task uuid: 220, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: SEND, Task uuid: 295, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.128 GB max_memory_allocated: 25.524 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.128 GB max_memory_allocated: 25.524 GB next instruction: Opcode: RUN, Task uuid: 2, Info: stage 1 | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.544 GB max_memory_allocated: 25.544 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.153 GB max_memory_allocated: 25.544 GB next instruction: Opcode: SEND, Task uuid: 220, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.151 GB max_memory_allocated: 25.544 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7349, ip=172.31.43.70) memory_allocated: 25.151 GB max_memory_allocated: 25.544 GB next instruction: Opcode: SEND, Task uuid: 118, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.362 GB max_memory_allocated: 3.411 GB next instruction: Opcode: SEND, Task uuid: 298, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.360 GB max_memory_allocated: 3.411 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.360 GB max_memory_allocated: 3.411 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2 | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.262 GB next instruction: Opcode: RECV, Task uuid: 299, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.261 GB max_memory_allocated: 3.262 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 4 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.525 GB max_memory_allocated: 3.525 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.473 GB max_memory_allocated: 3.525 GB next instruction: Opcode: RUN, Task uuid: 37, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.476 GB max_memory_allocated: 3.525 GB next instruction: Opcode: RECV, Task uuid: 116, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.476 GB max_memory_allocated: 3.525 GB next instruction: Opcode: SEND, Task uuid: 295, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.473 GB max_memory_allocated: 3.525 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.473 GB max_memory_allocated: 3.525 GB next instruction: Opcode: SEND, Task uuid: 298, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.471 GB max_memory_allocated: 3.525 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.471 GB max_memory_allocated: 3.525 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.636 GB max_memory_allocated: 3.636 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.585 GB max_memory_allocated: 3.636 GB next instruction: Opcode: RUN, Task uuid: 40, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.587 GB max_memory_allocated: 3.636 GB next instruction: Opcode: RECV, Task uuid: 116, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.587 GB max_memory_allocated: 3.636 GB next instruction: Opcode: SEND, Task uuid: 295, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.585 GB max_memory_allocated: 3.636 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.585 GB max_memory_allocated: 3.636 GB next instruction: Opcode: SEND, Task uuid: 298, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.582 GB max_memory_allocated: 3.636 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.582 GB max_memory_allocated: 3.636 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.747 GB max_memory_allocated: 3.747 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.696 GB max_memory_allocated: 3.747 GB next instruction: Opcode: RUN, Task uuid: 44, Info: allocate zero for recv | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.698 GB max_memory_allocated: 3.747 GB next instruction: Opcode: RECV, Task uuid: 116, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.698 GB max_memory_allocated: 3.747 GB next instruction: Opcode: SEND, Task uuid: 295, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.696 GB max_memory_allocated: 3.747 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.696 GB max_memory_allocated: 3.747 GB next instruction: Opcode: SEND, Task uuid: 298, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.693 GB max_memory_allocated: 3.747 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.693 GB max_memory_allocated: 3.747 GB next instruction: Opcode: RUN, Task uuid: 3, Info: stage 2 | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.858 GB max_memory_allocated: 3.858 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=7234, ip=172.31.39.47) memory_allocated: 3.807 GB max_memory_allocated: 3.858 GB next instruction: Opcode: SEND, Task uuid: 238, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.595 GB max_memory_allocated: 3.646 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.586 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RUN, Task uuid: 38, Info: allocate zero for recv | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 335, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 113, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 221, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 296, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RECV, Task uuid: 299, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: SEND, Task uuid: 352, Info: | |
(MeshHostWorker pid=16300) memory_allocated: 0.118 GB max_memory_allocated: 0.228 GB next instruction: Opcode: RUN, Task uuid: 6, Info: stage 5 | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.596 GB max_memory_allocated: 3.646 GB next instruction: Opcode: RUN, Task uuid: 5, Info: stage 4 | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.376395: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.381601: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.930 GB max_memory_allocated: 3.981 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.386703: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.391778: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.396789: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.401734: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.406692: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.411683: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.416699: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.421691: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.426743: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.432283: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.437392: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.442363: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.447344: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.452296: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.457279: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.462275: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.467347: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.472334: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.477384: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.482338: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.487382: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
2022-06-24 00:58:15,521 ERROR worker.py:94 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::MeshHostWorker.run_executable() (pid=6857, ip=172.31.43.221, repr=<alpa.device_mesh.MeshHostWorker object at 0x7fb040f65eb0>) | |
File "/nfs/cjr/alpa-torch-software/alpa/alpa/device_mesh.py", line 267, in run_executable | |
self.executables[uuid].execute_on_worker(*args, **kwargs) | |
File "/nfs/cjr/alpa-torch-software/alpa/alpa/pipeline_parallel/pipeshard_executable.py", line 465, in execute_on_worker | |
self.worker.run_executable(instruction.task_uuid, | |
File "/nfs/cjr/alpa-torch-software/alpa/alpa/device_mesh.py", line 267, in run_executable | |
self.executables[uuid].execute_on_worker(*args, **kwargs) | |
File "/nfs/cjr/alpa-torch-software/alpa/alpa/mesh_executable.py", line 1178, in execute_on_worker | |
self.allocate_zero_buffers.execute_sharded_on_local_devices([])) | |
RuntimeError: INVALID_ARGUMENT: stream is uninitialized or in an error state: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well). | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.492387: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.497348: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) memory_allocated: 3.920 GB max_memory_allocated: 3.981 GB next instruction: Opcode: RUN, Task uuid: 41, Info: allocate zero for recv | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.502319: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507374: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:1055] could not synchronize on CUDA context: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered :: *** Begin stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyObject_MakeTpCall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyObject_Call | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_FastCallDict | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string), ray::Status (*)(ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string)>::_M_invoke(std::_Any_data const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::vector<ray::ObjectID, std::allocator<ray::ObjectID> > const&, std::string const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, std::string const&, std::shared_ptr<ray::FunctionDescriptorInterface> const&, ray::TaskID, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerDirectTaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorker::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) ray::core::CoreWorkerProcess::RunTaskExecutionLoop() | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyFunction_Vectorcall | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalFrameDefault | |
(MeshHostWorker pid=6857, ip=172.31.43.221) _PyEval_EvalCodeWithName | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyEval_EvalCode | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) PyRun_SimpleFileExFlags | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_RunMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) Py_BytesMain | |
(MeshHostWorker pid=6857, ip=172.31.43.221) __libc_start_main | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) *** End stack trace *** | |
(MeshHostWorker pid=6857, ip=172.31.43.221) | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507811: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507879: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.507951: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508028: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508112: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508187: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508266: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=6857, ip=172.31.43.221) 2022-06-24 00:58:15.508336: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2143] Execution of replica 0 failed: INVALID_ARGUMENT: stream is uninitialized or in an error state | |
(MeshHostWorker pid=16300) memory_allocated: 0.443 GB max_memory_allocated: 0.481 GB next instruction: Opcode: FREE, Task uuid: None, Info: | |
(MeshHostWorker pid=16300) memory_allocated: 0.430 GB max_memory_allocated: 0.481 GB next instruction: Opcode: RUN, Task uuid: 42, Info: allocate zero for recv | |
(MeshHostWorker pid=16300) memory_allocated: 0.443 GB max_memory_allocated: 0.481 GB next instruction: Opcode: RECV, Task uuid: 353, Info: | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment