Created
March 1, 2022 01:18
-
-
Save EricCousineau-TRI/4822b8be94fccc7483a51040e7f44d47 to your computer and use it in GitHub Desktop.
towards ray issue https://github.com/ray-project/ray/issues/19834
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[2022-03-01 00:56:43,436 I 2347 2347] io_service_pool.cc:35: IOServicePool is running with 1 io_service. | |
[2022-03-01 00:56:43,446 I 2347 2347] store_runner.cc:30: Allowing the Plasma store to use up to 76.9778GB of memory. | |
[2022-03-01 00:56:43,446 I 2347 2347] store_runner.cc:46: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled | |
[2022-03-01 00:56:43,451 I 2347 2370] dlmalloc.cc:146: create_and_mmap_buffer(76977799176, /dev/shm/plasmaXXXXXX) | |
[2022-03-01 00:56:44,453 I 2347 2347] grpc_server.cc:112: ObjectManager server started, listening on port 8076. | |
[2022-03-01 00:56:44,510 I 2347 2347] node_manager.cc:302: Initializing NodeManager with ID <node-id-1> | |
[2022-03-01 00:56:44,511 I 2347 2347] grpc_server.cc:112: NodeManager server started, listening on port 45613. | |
[2022-03-01 00:56:44,519 I 2347 2398] agent_manager.cc:85: Monitor agent process with pid 2397, register timeout 30000ms. | |
[2022-03-01 00:56:44,522 I 2347 2347] raylet.cc:103: Raylet of id, <node-id-1> started. Raylet consists of node_manager and object_manager. node_manager address: <worker-ip>:45613 object_manager address: <worker-ip>:8076 hostname: <worker-ip> | |
[2022-03-01 00:56:44,526 I 2347 2347] accessor.cc:560: Received notification for node id = <node-id-2>, IsAlive = 1 | |
[2022-03-01 00:56:44,526 I 2347 2347] accessor.cc:560: Received notification for node id = <node-id-1>, IsAlive = 1 | |
[2022-03-01 00:56:48,683 I 2347 2347] agent_manager.cc:36: HandleRegisterAgent, ip: <worker-ip>, port: 60949, pid: 2397 | |
[2022-03-01 01:06:44,577 I 2347 2347] node_manager.cc:620: Sending Python GC request to 0 local workers to clean up Python cyclic references. | |
[2022-03-01 01:07:30,834 W 2347 2398] agent_manager.cc:101: Agent process with pid 2397 exit, return value 0. ip <worker-ip>. pid 2397 | |
[2022-03-01 01:07:34,100 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:07:44,110 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:07:54,120 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:08:04,130 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:08:14,140 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:08:24,149 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:08:34,160 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:08:44,171 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. | |
[2022-03-01 01:08:54,182 W 2347 2364] metric_exporter.cc:206: Export metrics to agent failed: IOError: . This won't affect Ray, but you can lose metrics from the cluster. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[2022-02-28 21:26:37,408 I 2094 2094] io_service_pool.cc:35: IOServicePool is running with 1 io_service. | |
[2022-02-28 21:26:37,409 I 2094 2094] store_runner.cc:30: Allowing the Plasma store to use up to 77.0424GB of memory. | |
[2022-02-28 21:26:37,409 I 2094 2094] store_runner.cc:46: Starting object store with directory /dev/shm, fallback /tmp/ray, and huge page support disabled | |
[2022-02-28 21:26:37,409 I 2094 2111] dlmalloc.cc:146: create_and_mmap_buffer(77042417672, /dev/shm/plasmaXXXXXX) | |
[2022-02-28 21:26:37,453 C 2094 2094] grpc_server.cc:102: Check failed: server_ Failed to start the grpc server. The specified port is 8076. This means that Ray's core components will not be able to function correctly. If the server startup error message is `Address already in use`, it indicates the server fails to start because the port is already used by other processes (such as --node-manager-port, --object-manager-port, --gcs-server-port, and ports between --min-worker-port, --max-worker-port). Try running lsof -i :8076 to check if there are other processes listening to the port. | |
*** StackTrace Information *** | |
ray::SpdLogMessage::Flush() | |
ray::RayLog::~RayLog() | |
ray::rpc::GrpcServer::Run() | |
ray::ObjectManager::ObjectManager() | |
ray::raylet::NodeManager::NodeManager() | |
ray::raylet::Raylet::Raylet() | |
main::{lambda()#1}::operator()() | |
std::_Function_handler<>::_M_invoke() | |
std::_Function_handler<>::_M_invoke() | |
std::_Function_handler<>::_M_invoke() | |
ray::rpc::ClientCallImpl<>::OnReplyReceived() | |
std::_Function_handler<>::_M_invoke() | |
boost::asio::detail::completion_handler<>::do_complete() | |
boost::asio::detail::scheduler::do_run_one() | |
boost::asio::detail::scheduler::run() | |
boost::asio::io_context::run() | |
main | |
__libc_start_main |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment