Skip to content

Instantly share code, notes, and snippets.

Last active February 7, 2022 16:58
Show Gist options
  • Save vicyap/47bcc6b078abfbae44bafac3676cc01e to your computer and use it in GitHub Desktop.
Save vicyap/47bcc6b078abfbae44bafac3676cc01e to your computer and use it in GitHub Desktop.
from pprint import pprint
import ray
def task():
import time
results = ray.get([task.remote() for _ in range(200)])
Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed; the resources are only refreshed on operator restarts.
py38-cu112,karpenter:2022-02-07 08:03:07,288 DEBUG -- Updating the resources of node type head to include {'CPU': 0, 'GPU': 0, 'memory': 5261334937}.
py38-cu112,karpenter:2022-02-07 08:03:07,289 DEBUG -- Updating the resources of node type rayHeadType to include {'CPU': 1, 'GPU': 0, 'memory': 375809638}.
py38-cu112,karpenter:2022-02-07 08:03:07,289 DEBUG -- Updating the resources of node type rayWorkerType to include {'CPU': 1, 'GPU': 0, 'memory': 375809638}.
py38-cu112,karpenter:2022-02-07 08:03:07,289 DEBUG -- Updating the resources of node type wkr-15cpu30g-ondemand to include {'CPU': 15, 'GPU': 0, 'memory': 22548578304}.
py38-cu112,karpenter:2022-02-07 08:03:07,289 DEBUG -- Updating the resources of node type wkr-15cpu30g-spot to include {'CPU': 15, 'GPU': 0, 'memory': 22548578304}.
py38-cu112,karpenter:2022-02-07 08:03:07,290 DEBUG -- Updating the resources of node type wkr-30cpu250g-spot to include {'CPU': 30, 'GPU': 0, 'memory': 187904819200}.
py38-cu112,karpenter:2022-02-07 08:03:07,290 DEBUG -- Updating the resources of node type wkr-30cpu60g-spot to include {'CPU': 30, 'GPU': 0, 'memory': 45097156608}.
py38-cu112,karpenter:2022-02-07 08:03:07,290 DEBUG -- Updating the resources of node type wkr-7cpu14g-spot to include {'CPU': 7, 'GPU': 0, 'memory': 10522669875}.
py38-cu112,karpenter:2022-02-07 08:03:07,290 DEBUG -- Updating the resources of node type wkr-p2-16gpu to include {'CPU': 63, 'GPU': 16, 'memory': 538159402188, 'accelerator_type:p2': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,290 DEBUG -- Updating the resources of node type wkr-p2-8gpu to include {'CPU': 7, 'GPU': 8, 'memory': 354764298649, 'accelerator_type:p2': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,290 DEBUG -- Updating the resources of node type wkr-p3-1gpu to include {'CPU': 7, 'GPU': 1, 'memory': 42090679500, 'accelerator_type:p3': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,291 DEBUG -- Updating the resources of node type wkr-p3-4gpu to include {'CPU': 31, 'GPU': 4, 'memory': 171369195110, 'accelerator_type:p3': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,291 DEBUG -- Updating the resources of node type wkr-p3-8gpu to include {'CPU': 63, 'GPU': 8, 'memory': 354764298649, 'accelerator_type:p3': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,291 DEBUG -- Updating the resources of node type wkr-p3dn-8gpu to include {'CPU': 95, 'GPU': 8, 'memory': 565217696153, 'accelerator_type:p3dn': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,291 DEBUG -- Updating the resources of node type wkr-p4d-8gpu to include {'CPU': 95, 'GPU': 8, 'memory': 829787681587, 'accelerator_type:p4d': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,291 DEBUG -- Updating the resources of node type worker-p2-1gpu to include {'CPU': 3, 'GPU': 1, 'memory': 42090679500, 'accelerator_type:p2': 1}.
py38-cu112,karpenter:2022-02-07 08:03:07,374 INFO -- KubernetesNodeProvider: service 'py38-cu112-ray-head' not found, attempting to create it
py38-cu112,karpenter:2022-02-07 08:03:07,409 INFO -- KubernetesNodeProvider: successfully created service 'py38-cu112-ray-head'
py38-cu112,karpenter:2022-02-07 08:03:07,437 INFO -- KubernetesNodeProvider: calling create_namespaced_pod (count=1).
py38-cu112,karpenter:2022-02-07 08:03:07,564 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server (BadRequest): pod py38-cu112-head-2nxkg does not have a host assigned
py38-cu112,karpenter:2022-02-07 08:03:12,999 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:18,151 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:23,306 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:28,462 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:33,636 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:38,788 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:43,981 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:49,182 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:54,382 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:03:59,524 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:04:04,695 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:09,956 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:15,135 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:20,305 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:25,499 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:30,660 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:35,827 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:41,045 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:46,255 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:51,420 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:04:56,578 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:01,731 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:06,959 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
2022-02-07 08:03:07,201 INFO -- Cluster: py38-cu112
2022-02-07 08:03:07,284 INFO -- Checking Kubernetes environment settings
2022-02-07 08:03:07,437 INFO -- No head node found. Launching a new cluster. Confirm [y/N]: y [automatic, due to --yes]
2022-02-07 08:03:07,437 INFO -- Acquiring an up-to-date head node
2022-02-07 08:03:07,480 INFO -- Launched a new head node
2022-02-07 08:03:07,481 INFO -- Fetching the new head node
2022-02-07 08:03:07,499 INFO -- <1/1> Setting up head node
2022-02-07 08:03:07,544 INFO -- New status: waiting-for-ssh
2022-02-07 08:03:07,547 INFO -- [1/7] Waiting for SSH to become available
2022-02-07 08:03:07,547 INFO -- Running `uptime` as a test.
2022-02-07 08:03:07,977 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:13,124 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:18,284 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:23,443 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:28,596 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:33,761 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:38,954 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:44,165 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:49,361 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:54,499 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:03:59,668 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:04,911 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:10,110 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:15,281 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:20,476 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:25,636 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:30,797 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:36,012 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:41,221 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:46,400 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:51,546 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:04:56,706 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:01,938 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
py38-cu112,karpenter:2022-02-07 08:05:12,140 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:17,309 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:22,474 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:27,648 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:32,832 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:38,002 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:43,253 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:48,426 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:53,590 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:05:58,771 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:03,929 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:09,114 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:14,283 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:19,449 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:24,624 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:29,847 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:35,027 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:40,205 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:45,406 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:50,562 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:06:55,742 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:00,942 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:06,121 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:11,281 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:16,452 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:21,638 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:26,805 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:31,987 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
2022-02-07 08:05:07,118 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:12,284 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:17,441 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:22,626 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:27,806 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:32,980 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:38,223 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:43,404 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:48,549 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:53,746 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:05:58,893 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:04,090 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:09,254 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:14,422 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:19,597 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:24,809 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:30,004 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:35,178 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:40,379 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:45,535 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:50,717 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:06:55,904 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:01,090 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:06,252 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:11,418 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:16,615 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:21,773 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:26,960 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
py38-cu112,karpenter:2022-02-07 08:07:37,163 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:42,304 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:47,459 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:52,656 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:07:57,861 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:03,025 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:08,209 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:13,395 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:18,601 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:23,750 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:28,960 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:34,120 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:39,328 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:44,526 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:49,707 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:08:54,880 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:09:00,041 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:09:05,208 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:09:10,586 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:09:15,781 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:09:21,011 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("ray-node")
py38-cu112,karpenter:2022-02-07 08:09:26,172 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2022-02-07 08:07:32,138 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds. 08:09:26 up 6 min, 0 users, load average: 3.63, 2.79, 1.29
py38-cu112,karpenter:2022-02-07 08:09:27,039 DEBUG -- Node tags: {'': 'py38-cu112-ray-head', 'ray-cluster-name': 'py38-cu112', 'ray-launch-config': '5dcbc061dc79f38f8914ca1c8b0689c81b0b91dd', 'ray-node-name': 'ray-py38-cu112-head', 'ray-node-status': 'waiting-for-ssh', 'ray-node-type': 'head', 'ray-node-uuid': '61139f98-01d3-4beb-8ea6-3396a3ab4090', 'ray-user-node-type': 'head'}
py38-cu112,karpenter:2022-02-07 08:09:27,232 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":0,"GPU":0,"memory":5261334937}'"'"';ray stop)'
Unable to use a TTY - input is not a terminal or the right kind of file
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2022-02-07 08:09:30,061 INFO -- Did not find any active Ray processes.
py38-cu112,karpenter:2022-02-07 08:09:30,234 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (export RAY_OVERRIDE_RESOURCES='"'"'{"CPU":0,"GPU":0,"memory":5261334937}'"'"';ulimit -n 65536; ray start --head --no-monitor --dashboard-host'
Unable to use a TTY - input is not a terminal or the right kind of file
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
2022-02-07 08:09:34,147 INFO -- View the Ray dashboard at
2022-02-07 08:09:31,581 INFO -- Local node IP:
2022-02-07 08:09:35,479 SUCC -- --------------------
2022-02-07 08:09:35,479 SUCC -- Ray runtime started.
2022-02-07 08:09:35,479 SUCC -- --------------------
2022-02-07 08:09:35,479 INFO -- Next steps
2022-02-07 08:09:35,479 INFO -- To connect to this Ray runtime from another node, run
2022-02-07 08:09:35,479 INFO --  ray start --address='' --redis-password='5241590000000000'
2022-02-07 08:09:35,479 INFO -- Alternatively, use the following Python code:
2022-02-07 08:09:35,479 INFO -- import ray
2022-02-07 08:09:35,479 INFO -- ray.init(address='auto', _redis_password='5241590000000000')
2022-02-07 08:09:35,479 INFO -- To connect to this Ray runtime from outside of the cluster, for example to
2022-02-07 08:09:35,479 INFO -- connect to a remote cluster from your laptop directly, use the following
2022-02-07 08:09:35,479 INFO -- Python code:
2022-02-07 08:09:35,479 INFO -- import ray
2022-02-07 08:09:35,480 INFO -- ray.init(address='ray://<head_node_ip_address>:10001')
2022-02-07 08:09:35,480 INFO -- If connection fails, check your firewall settings and network configuration.
2022-02-07 08:09:35,480 INFO -- To terminate the Ray runtime, run
2022-02-07 08:09:35,480 INFO --  ray stop
2022-02-07 08:07:37,273 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:42,431 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:47,626 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:52,835 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:07:58,003 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:03,182 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:08,376 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:13,566 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:18,726 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:23,926 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:29,098 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:34,274 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:39,481 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:44,673 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:49,853 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:08:55,017 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:09:00,182 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:09:05,559 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:09:10,745 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:09:15,978 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:09:21,141 INFO -- SSH still not available (Exit Status 1): kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)', retrying in 5 seconds.
2022-02-07 08:09:27,002 SUCC -- Success.
2022-02-07 08:09:27,002 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Got remote shell [LogTimer=379455ms]
2022-02-07 08:09:27,040 INFO -- Updating cluster configuration. [hash=4416c6d3887de7ad85256198044e24be2562a916]
2022-02-07 08:09:27,145 INFO -- New status: syncing-files
2022-02-07 08:09:27,145 INFO -- [2/7] Processing file mounts
2022-02-07 08:09:27,145 INFO -- [3/7] No worker file mounts to sync
2022-02-07 08:09:27,230 INFO -- New status: setting-up
2022-02-07 08:09:27,230 INFO -- [4/7] No initialization commands to run.
2022-02-07 08:09:27,231 INFO -- [5/7] Initalizing command runner
2022-02-07 08:09:27,232 INFO -- [6/7] No setup commands to run.
2022-02-07 08:09:27,232 INFO -- [7/7] Starting the Ray runtime
2022-02-07 08:09:35,673 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Ray start commands succeeded [LogTimer=8441ms]
2022-02-07 08:09:35,673 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Applied config 4416c6d3887de7ad85256198044e24be2562a916 [LogTimer=388173ms]
2022-02-07 08:09:35,744 INFO -- New status: up-to-date
2022-02-07 08:09:35,755 INFO -- Useful commands
2022-02-07 08:09:35,755 INFO -- Monitor autoscaling with
2022-02-07 08:09:35,755 INFO --  ray exec /home/ray/ray_cluster_configs/karpenter/py38-cu112_config.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
py38-cu112,karpenter:2022-02-07 08:09:36,365 INFO -- Monitor: Started
py38-cu112,karpenter:2022-02-07 08:09:36,368 DEBUG -- internal_kv_del b'__autoscaling_error' False None
py38-cu112,karpenter:2022-02-07 08:09:36,832 INFO -- StandardAutoscaler: {'auth': {}, 'available_node_types': {'head': {'max_workers': 0, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-head-', 'labels': {}, 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 0, 'memory': '7G'}, 'requests': {'cpu': 0, 'memory': '7G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'on-demand'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 0, 'GPU': 0, 'memory': 5261334937}}, 'rayHeadType': {'max_workers': 0, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-ray-head-type-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 1, 'memory': '512Mi'}, 'requests': {'cpu': 1, 'memory': '512Mi'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 1, 'GPU': 0, 'memory': 375809638}}, 'rayWorkerType': {'max_workers': 0, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-ray-worker-type-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 1, 'memory': '512Mi'}, 'requests': {'cpu': 1, 'memory': '512Mi'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 1, 'GPU': 0, 'memory': 375809638}}, 'wkr-15cpu30g-ondemand': {'max_workers': 1, 'min_workers': 1, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-15cpu30g--ondemand-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 15, 'memory': '30G'}, 'requests': {'cpu': 15, 'memory': '30G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'on-demand'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 15, 'GPU': 0, 'memory': 22548578304}}, 'wkr-15cpu30g-spot': {'max_workers': 100, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-15cpu30g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 15, 'memory': '30G'}, 'requests': {'cpu': 15, 'memory': '30G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 15, 'GPU': 0, 'memory': 22548578304}}, 'wkr-30cpu250g-spot': {'max_workers': 1, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-30cpu250g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 30, 'memory': '250G'}, 'requests': {'cpu': 30, 'memory': '250G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 30, 'GPU': 0, 'memory': 187904819200}}, 'wkr-30cpu60g-spot': {'max_workers': 50, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-30cpu60g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 30, 'memory': '60G'}, 'requests': {'cpu': 30, 'memory': '60G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 30, 'GPU': 0, 'memory': 45097156608}}, 'wkr-7cpu14g-spot': {'max_workers': 100, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-7cpu14g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 7, 'memory': '14G'}, 'requests': {'cpu': 7, 'memory': '14G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 7, 'GPU': 0, 'memory': 10522669875}}, 'wkr-p2-16gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p2-16gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 63, 'memory': '716G', '': 16}, 'requests': {'cpu': 63, 'memory': '716G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p2'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 63, 'GPU': 16, 'accelerator_type:p2': 1, 'memory': 538159402188}}, 'wkr-p2-8gpu': {'max_workers': 8, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p2-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 7, 'memory': '472G', '': 8}, 'requests': {'cpu': 7, 'memory': '472G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p2'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 7, 'GPU': 8, 'accelerator_type:p2': 1, 'memory': 354764298649}}, 'wkr-p3-1gpu': {'max_workers': 32, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p3-1gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 7, 'memory': '56G', '': 1}, 'requests': {'cpu': 7, 'memory': '56G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 7, 'GPU': 1, 'accelerator_type:p3': 1, 'memory': 42090679500}}, 'wkr-p3-4gpu': {'max_workers': 8, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p3-4gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 31, 'memory': '228G', '': 4}, 'requests': {'cpu': 31, 'memory': '228G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 31, 'GPU': 4, 'accelerator_type:p3': 1, 'memory': 171369195110}}, 'wkr-p3-8gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p3-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 63, 'memory': '472G', '': 8}, 'requests': {'cpu': 63, 'memory': '472G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 63, 'GPU': 8, 'accelerator_type:p3': 1, 'memory': 354764298649}}, 'wkr-p3dn-8gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p-3dn-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 95, 'memory': '752G', '': 8}, 'requests': {'cpu': 95, 'memory': '752G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3dn'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 95, 'GPU': 8, 'accelerator_type:p3dn': 1, 'memory': 565217696153}}, 'wkr-p4d-8gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p-4d-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 95, 'memory': '1104G', '': 8}, 'requests': {'cpu': 95, 'memory': '1104G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p4d'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 95, 'GPU': 8, 'accelerator_type:p4d': 1, 'memory': 829787681587}}, 'worker-p2-1gpu': {'max_workers': 32, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-worker-p2-1gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 3, 'memory': '56G', '': 1}, 'requests': {'cpu': 3, 'memory': '56G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p2'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 3, 'GPU': 1, 'accelerator_type:p2': 1, 'memory': 42090679500}}}, 'cluster_name': 'py38-cu112', 'cluster_synced_files': [], 'file_mounts': {}, 'file_mounts_sync_continuously': False, 'head_node': {}, 'head_node_type': 'head', 'head_setup_commands': [], 'head_start_ray_commands': ['ray stop', 'ulimit -n 65536; ray start --head --no-monitor --dashboard-host'], 'idle_timeout_minutes': 5, 'initialization_commands': [], 'max_workers': 348, 'provider': {'_operator': True, 'namespace': 'karpenter', 'services': [{'apiVersion': 'v1', 'kind': 'Service', 'metadata': {'name': 'py38-cu112-ray-head', 'namespace': 'karpenter', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'ports': [{'name': 'client', 'port': 10001, 'protocol': 'TCP', 'targetPort': 10001}, {'name': 'dashboard', 'port': 8265, 'protocol': 'TCP', 'targetPort': 8265}, {'name': 'ray-serve', 'port': 8000, 'protocol': 'TCP', 'targetPort': 8000}], 'selector': {'': 'py38-cu112-ray-head'}}}], 'type': 'kubernetes', 'use_internal_ips': True}, 'setup_commands': [], 'upscaling_speed': 9999, 'worker_nodes': {}, 'worker_setup_commands': [], 'worker_start_ray_commands': ['ray stop', 'ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379']}
2022-02-07 08:09:35,755 INFO -- Connect to a terminal on the cluster head:
2022-02-07 08:09:35,755 INFO --  ray attach /home/ray/ray_cluster_configs/karpenter/py38-cu112_config.yaml
2022-02-07 08:09:35,755 INFO -- Get a remote shell to the cluster manually:
2022-02-07 08:09:35,755 INFO -- kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash
py38-cu112,karpenter:2022-02-07 08:09:37,271 DEBUG -- Updating the resources of node type head to include {'CPU': 0, 'GPU': 0, 'memory': 5261334937}.
py38-cu112,karpenter:2022-02-07 08:09:37,271 DEBUG -- Updating the resources of node type rayHeadType to include {'CPU': 1, 'GPU': 0, 'memory': 375809638}.
py38-cu112,karpenter:2022-02-07 08:09:37,271 DEBUG -- Updating the resources of node type rayWorkerType to include {'CPU': 1, 'GPU': 0, 'memory': 375809638}.
py38-cu112,karpenter:2022-02-07 08:09:37,271 DEBUG -- Updating the resources of node type wkr-15cpu30g-ondemand to include {'CPU': 15, 'GPU': 0, 'memory': 22548578304}.
py38-cu112,karpenter:2022-02-07 08:09:37,271 DEBUG -- Updating the resources of node type wkr-15cpu30g-spot to include {'CPU': 15, 'GPU': 0, 'memory': 22548578304}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-30cpu250g-spot to include {'CPU': 30, 'GPU': 0, 'memory': 187904819200}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-30cpu60g-spot to include {'CPU': 30, 'GPU': 0, 'memory': 45097156608}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-7cpu14g-spot to include {'CPU': 7, 'GPU': 0, 'memory': 10522669875}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-p2-16gpu to include {'CPU': 63, 'GPU': 16, 'memory': 538159402188, 'accelerator_type:p2': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-p2-8gpu to include {'CPU': 7, 'GPU': 8, 'memory': 354764298649, 'accelerator_type:p2': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-p3-1gpu to include {'CPU': 7, 'GPU': 1, 'memory': 42090679500, 'accelerator_type:p3': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-p3-4gpu to include {'CPU': 31, 'GPU': 4, 'memory': 171369195110, 'accelerator_type:p3': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,272 DEBUG -- Updating the resources of node type wkr-p3-8gpu to include {'CPU': 63, 'GPU': 8, 'memory': 354764298649, 'accelerator_type:p3': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,273 DEBUG -- Updating the resources of node type wkr-p3dn-8gpu to include {'CPU': 95, 'GPU': 8, 'memory': 565217696153, 'accelerator_type:p3dn': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,273 DEBUG -- Updating the resources of node type wkr-p4d-8gpu to include {'CPU': 95, 'GPU': 8, 'memory': 829787681587, 'accelerator_type:p4d': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,273 DEBUG -- Updating the resources of node type worker-p2-1gpu to include {'CPU': 3, 'GPU': 1, 'memory': 42090679500, 'accelerator_type:p2': 1}.
py38-cu112,karpenter:2022-02-07 08:09:37,341 INFO -- KubernetesNodeProvider: updating existing service 'py38-cu112-ray-head'
py38-cu112,karpenter:2022-02-07 08:09:37,482 INFO -- NodeUpdater: py38-cu112-head-2nxkg: Running kubectl -n karpenter exec -it py38-cu112-head-2nxkg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
08:09:38 up 6 min, 0 users, load average: 3.22, 2.73, 1.29
py38-cu112,karpenter:2022-02-07 08:09:38,045 DEBUG -- Node tags: {'': 'py38-cu112-ray-head', 'ray-cluster-name': 'py38-cu112', 'ray-file-mounts-contents': 'da39a3ee5e6b4b0d3255bfef95601890afd80709', 'ray-launch-config': '5dcbc061dc79f38f8914ca1c8b0689c81b0b91dd', 'ray-node-name': 'ray-py38-cu112-head', 'ray-node-status': 'waiting-for-ssh', 'ray-node-type': 'head', 'ray-node-uuid': '61139f98-01d3-4beb-8ea6-3396a3ab4090', 'ray-runtime-config': '4416c6d3887de7ad85256198044e24be2562a916', 'ray-user-node-type': 'head'}
py38-cu112,karpenter:2022-02-07 08:09:38,682 INFO -- Monitor: Started
py38-cu112,karpenter:2022-02-07 08:09:38,683 DEBUG -- internal_kv_del b'__autoscaling_error' False None
py38-cu112,karpenter:2022-02-07 08:09:39,048 INFO -- StandardAutoscaler: {'auth': {}, 'available_node_types': {'head': {'max_workers': 0, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-head-', 'labels': {}, 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 0, 'memory': '7G'}, 'requests': {'cpu': 0, 'memory': '7G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'on-demand'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 0, 'GPU': 0, 'memory': 5261334937}}, 'rayHeadType': {'max_workers': 0, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-ray-head-type-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 1, 'memory': '512Mi'}, 'requests': {'cpu': 1, 'memory': '512Mi'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 1, 'GPU': 0, 'memory': 375809638}}, 'rayWorkerType': {'max_workers': 0, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-ray-worker-type-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 1, 'memory': '512Mi'}, 'requests': {'cpu': 1, 'memory': '512Mi'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 1, 'GPU': 0, 'memory': 375809638}}, 'wkr-15cpu30g-ondemand': {'max_workers': 1, 'min_workers': 1, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-15cpu30g--ondemand-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 15, 'memory': '30G'}, 'requests': {'cpu': 15, 'memory': '30G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'on-demand'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 15, 'GPU': 0, 'memory': 22548578304}}, 'wkr-15cpu30g-spot': {'max_workers': 100, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-15cpu30g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 15, 'memory': '30G'}, 'requests': {'cpu': 15, 'memory': '30G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 15, 'GPU': 0, 'memory': 22548578304}}, 'wkr-30cpu250g-spot': {'max_workers': 1, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-30cpu250g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 30, 'memory': '250G'}, 'requests': {'cpu': 30, 'memory': '250G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 30, 'GPU': 0, 'memory': 187904819200}}, 'wkr-30cpu60g-spot': {'max_workers': 50, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-30cpu60g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 30, 'memory': '60G'}, 'requests': {'cpu': 30, 'memory': '60G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 30, 'GPU': 0, 'memory': 45097156608}}, 'wkr-7cpu14g-spot': {'max_workers': 100, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-7cpu14g--spot-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 7, 'memory': '14G'}, 'requests': {'cpu': 7, 'memory': '14G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'spot'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 7, 'GPU': 0, 'memory': 10522669875}}, 'wkr-p2-16gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p2-16gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 63, 'memory': '716G', '': 16}, 'requests': {'cpu': 63, 'memory': '716G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p2'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 63, 'GPU': 16, 'accelerator_type:p2': 1, 'memory': 538159402188}}, 'wkr-p2-8gpu': {'max_workers': 8, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p2-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 7, 'memory': '472G', '': 8}, 'requests': {'cpu': 7, 'memory': '472G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p2'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 7, 'GPU': 8, 'accelerator_type:p2': 1, 'memory': 354764298649}}, 'wkr-p3-1gpu': {'max_workers': 32, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p3-1gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 7, 'memory': '56G', '': 1}, 'requests': {'cpu': 7, 'memory': '56G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 7, 'GPU': 1, 'accelerator_type:p3': 1, 'memory': 42090679500}}, 'wkr-p3-4gpu': {'max_workers': 8, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p3-4gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 31, 'memory': '228G', '': 4}, 'requests': {'cpu': 31, 'memory': '228G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 31, 'GPU': 4, 'accelerator_type:p3': 1, 'memory': 171369195110}}, 'wkr-p3-8gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p3-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 63, 'memory': '472G', '': 8}, 'requests': {'cpu': 63, 'memory': '472G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 63, 'GPU': 8, 'accelerator_type:p3': 1, 'memory': 354764298649}}, 'wkr-p3dn-8gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p-3dn-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 95, 'memory': '752G', '': 8}, 'requests': {'cpu': 95, 'memory': '752G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p3dn'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 95, 'GPU': 8, 'accelerator_type:p3dn': 1, 'memory': 565217696153}}, 'wkr-p4d-8gpu': {'max_workers': 4, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-wkr-p-4d-8gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 95, 'memory': '1104G', '': 8}, 'requests': {'cpu': 95, 'memory': '1104G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p4d'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 95, 'GPU': 8, 'accelerator_type:p4d': 1, 'memory': 829787681587}}, 'worker-p2-1gpu': {'max_workers': 32, 'min_workers': 0, 'node_config': {'apiVersion': 'v1', 'kind': 'Pod', 'metadata': {'generateName': 'py38-cu112-worker-p2-1gpu-', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'containers': [{'args': ['trap : TERM INT; sleep infinity & wait;'], 'command': ['/bin/bash', '-c', '--'], 'env': [{'name': 'RAY_gcs_server_rpc_server_thread_num', 'value': '1'}, {'name': 'RAY_PROFILING', 'value': '1'}], 'image': 'rayproject/ray-ml:1.10.0-py38-cu112', 'imagePullPolicy': 'Always', 'name': 'ray-node', 'ports': [{'containerPort': 6379, 'protocol': 'TCP'}, {'containerPort': 10001, 'protocol': 'TCP'}, {'containerPort': 8265, 'protocol': 'TCP'}, {'containerPort': 8000, 'protocol': 'TCP'}], 'resources': {'limits': {'cpu': 3, 'memory': '56G', '': 1}, 'requests': {'cpu': 3, 'memory': '56G'}}, 'volumeMounts': [{'mountPath': '/dev/shm', 'name': 'dshm'}, {'mountPath': '/shared', 'name': 'fsx-shared-b'}, {'mountPath': '/db', 'name': 'fsx-speech-db-b'}]}], 'nodeSelector': {'': 'p2'}, 'restartPolicy': 'Never', 'terminationGracePeriodSeconds': 43200, 'tolerations': [{'effect': 'NoSchedule', 'key': '', 'operator': 'Equal', 'value': 'true'}], 'volumes': [{'emptyDir': {'medium': 'Memory'}, 'name': 'dshm'}, {'name': 'fsx-shared-b', 'persistentVolumeClaim': {'claimName': 'fsx-shared-b'}}, {'name': 'fsx-speech-db-b', 'persistentVolumeClaim': {'claimName': 'fsx-speech-db-b'}}]}}, 'resources': {'CPU': 3, 'GPU': 1, 'accelerator_type:p2': 1, 'memory': 42090679500}}}, 'cluster_name': 'py38-cu112', 'cluster_synced_files': [], 'file_mounts': {}, 'file_mounts_sync_continuously': False, 'head_node': {}, 'head_node_type': 'head', 'head_setup_commands': [], 'head_start_ray_commands': ['ray stop', 'ulimit -n 65536; ray start --head --no-monitor --dashboard-host'], 'idle_timeout_minutes': 5, 'initialization_commands': [], 'max_workers': 348, 'provider': {'_operator': True, 'namespace': 'karpenter', 'services': [{'apiVersion': 'v1', 'kind': 'Service', 'metadata': {'name': 'py38-cu112-ray-head', 'namespace': 'karpenter', 'ownerReferences': [{'apiVersion': '', 'blockOwnerDeletion': True, 'controller': True, 'kind': 'RayCluster', 'name': 'py38-cu112', 'uid': '68636a35-fb5b-4b77-ba2b-e77bbdbabddf'}]}, 'spec': {'ports': [{'name': 'client', 'port': 10001, 'protocol': 'TCP', 'targetPort': 10001}, {'name': 'dashboard', 'port': 8265, 'protocol': 'TCP', 'targetPort': 8265}, {'name': 'ray-serve', 'port': 8000, 'protocol': 'TCP', 'targetPort': 8000}], 'selector': {'': 'py38-cu112-ray-head'}}}], 'type': 'kubernetes', 'use_internal_ips': True}, 'setup_commands': [], 'upscaling_speed': 9999, 'worker_nodes': {}, 'worker_setup_commands': [], 'worker_start_ray_commands': ['ray stop', 'ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379']}
py38-cu112,karpenter:2022-02-07 08:09:39,050 INFO -- Logging raw resource message pulled from GCS.
py38-cu112,karpenter:2022-02-07 08:09:39,051 INFO -- batch {
node_id: "\215\257\374\262H\272\316\332\004\306\350\0005w\266\201\ra;\354\3736L5\240\321E\032"
resources_available {
key: "memory"
value: 5261334937.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 2046035558.0
resources_available_changed: true
resources_total {
key: "memory"
value: 5261334937.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 2046035558.0
resource_load_by_shape {
node_manager_address: ""
placement_group_load {
py38-cu112,karpenter:2022-02-07 08:09:39,051 INFO -- Done logging raw resource message.
py38-cu112,karpenter:2022-02-07 08:09:39,051 DEBUG -- internal_kv_get b'autoscaler_resource_request' None
py38-cu112,karpenter:2022-02-07 08:09:39,520 INFO --
... (launched 200 tasks) ...
======== Autoscaler status: 2022-02-07 08:50:54.613180 ========
Node status
1 head
1 wkr-15cpu30g-ondemand
(no pending nodes)
Recent failures:
(no failures)
0.0/15.0 CPU
0.00/25.900 GiB memory
0.00/10.263 GiB object_store_memory
(no resource demands)
py38-cu112,karpenter:2022-02-07 08:50:54,649 DEBUG -- internal_kv_put b'__autoscaling_status_legacy' b"Cluster status: 1 nodes\n - MostDelayedHeartbeats: {'': 0.5076098442077637, '': 0.507556676864624}\n - NodeIdleSeconds: Min=1195 Mean=1195 Max=1195\n - ResourceUsage: 0.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory\n - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0\nWorker node types:\n - wkr-15cpu30g-ondemand: 1" True None
py38-cu112,karpenter:2022-02-07 08:50:54,651 DEBUG -- Cluster status: 1 nodes
- MostDelayedHeartbeats: {'': 0.5076098442077637, '': 0.507556676864624}
- NodeIdleSeconds: Min=1195 Mean=1195 Max=1195
- ResourceUsage: 0.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory
- TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
- wkr-15cpu30g-ondemand: 1
py38-cu112,karpenter:2022-02-07 08:50:54,793 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:50:54,861 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:50:55,062 DEBUG -- Cluster resources: [{'node:': 1.0, 'object_store_memory': 2046035558.0, 'memory': 5261334937.0}, {'CPU': 15.0, 'node:': 1.0, 'object_store_memory': 8973884620.0, 'memory': 22548578304.0}]
py38-cu112,karpenter:2022-02-07 08:50:55,062 DEBUG -- Node counts: defaultdict(<class 'int'>, {'head': 1, 'wkr-15cpu30g-ondemand': 1})
py38-cu112,karpenter:2022-02-07 08:50:55,062 DEBUG -- Placement group demands: []
py38-cu112,karpenter:2022-02-07 08:50:55,063 DEBUG -- Resource demands: []
py38-cu112,karpenter:2022-02-07 08:50:55,063 DEBUG -- Unfulfilled demands: []
py38-cu112,karpenter:2022-02-07 08:50:55,063 DEBUG -- Final unfulfilled: []
py38-cu112,karpenter:2022-02-07 08:50:55,208 DEBUG -- Node requests: {}
py38-cu112,karpenter:2022-02-07 08:50:55,271 DEBUG -- internal_kv_put b'__autoscaling_status' b'{"load_metrics_report": {"usage": {"object_store_memory": [0.0, 11019920178.0], "memory": [0.0, 27809913241.0], "node:": [0.0, 1.0], "node:": [0.0, 1.0], "CPU": [0.0, 15.0]}, "resource_demand": [], "pg_demand": [], "request_demand": [], "node_types": [[{"memory": 5261334937.0, "node:": 1.0, "object_store_memory": 2046035558.0}, 1], [{"CPU": 15.0, "object_store_memory": 8973884620.0, "node:": 1.0, "memory": 22548578304.0}, 1]], "head_ip": null}, "time": 1644252654.1068785, "monitor_pid": 857, "autoscaler_report": {"active_nodes": {"head": 1, "wkr-15cpu30g-ondemand": 1}, "pending_nodes": [], "pending_launches": {}, "failed_nodes": []}}' True None
py38-cu112,karpenter:2022-02-07 08:51:00,278 INFO -- Logging raw resource message pulled from GCS.
py38-cu112,karpenter:2022-02-07 08:51:00,278 INFO -- batch {
node_id: "t\210\224\325\036\271B\311\227_\220x\326\327\246\371a\276\200alox\037&\326 \023"
resources_available {
key: "memory"
value: 22548578304.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 8973884620.0
resources_available_changed: true
resources_total {
key: "CPU"
value: 15.0
resources_total {
key: "memory"
value: 22548578304.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 8973884620.0
resource_load {
key: "CPU"
value: 1.0
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
node_manager_address: ""
batch {
node_id: "\215\257\374\262H\272\316\332\004\306\350\0005w\266\201\ra;\354\3736L5\240\321E\032"
resources_available {
key: "memory"
value: 5261334937.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 2046034932.0
resources_available_changed: true
resources_total {
key: "memory"
value: 5261334937.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 2046035558.0
resource_load_by_shape {
node_manager_address: ""
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
placement_group_load {
py38-cu112,karpenter:2022-02-07 08:51:00,279 INFO -- Done logging raw resource message.
py38-cu112,karpenter:2022-02-07 08:51:00,280 DEBUG -- internal_kv_get b'autoscaler_resource_request' None
py38-cu112,karpenter:2022-02-07 08:51:00,821 INFO --
======== Autoscaler status: 2022-02-07 08:51:00.821530 ========
Node status
1 head
1 wkr-15cpu30g-ondemand
(no pending nodes)
Recent failures:
(no failures)
15.0/15.0 CPU
0.00/25.900 GiB memory
0.00/10.263 GiB object_store_memory
{'CPU': 1.0}: 1+ pending tasks/actors
py38-cu112,karpenter:2022-02-07 08:51:00,856 DEBUG -- internal_kv_put b'__autoscaling_status_legacy' b"Cluster status: 1 nodes\n - MostDelayedHeartbeats: {'': 0.5426044464111328, '': 0.541795015335083}\n - NodeIdleSeconds: Min=0 Mean=0 Max=0\n - ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory\n - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0\nWorker node types:\n - wkr-15cpu30g-ondemand: 1" True None
py38-cu112,karpenter:2022-02-07 08:51:00,857 DEBUG -- Cluster status: 1 nodes
- MostDelayedHeartbeats: {'': 0.5426044464111328, '': 0.541795015335083}
- NodeIdleSeconds: Min=0 Mean=0 Max=0
- ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory
- TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
- wkr-15cpu30g-ondemand: 1
py38-cu112,karpenter:2022-02-07 08:51:00,971 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:01,047 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:01,240 DEBUG -- Cluster resources: [{'node:': 1.0, 'object_store_memory': 2046034932.0, 'memory': 5261334937.0}, {'object_store_memory': 8973884620.0, 'node:': 1.0, 'memory': 22548578304.0, 'CPU': 0.0}]
py38-cu112,karpenter:2022-02-07 08:51:01,240 DEBUG -- Node counts: defaultdict(<class 'int'>, {'head': 1, 'wkr-15cpu30g-ondemand': 1})
py38-cu112,karpenter:2022-02-07 08:51:01,240 DEBUG -- Placement group demands: []
py38-cu112,karpenter:2022-02-07 08:51:01,240 DEBUG -- Resource demands: [{'CPU': 1.0}]
py38-cu112,karpenter:2022-02-07 08:51:01,240 DEBUG -- Unfulfilled demands: [{'CPU': 1.0}]
py38-cu112,karpenter:2022-02-07 08:51:01,241 DEBUG -- Final unfulfilled: []
py38-cu112,karpenter:2022-02-07 08:51:01,312 DEBUG -- Node requests: {'wkr-7cpu14g-spot': 1}
py38-cu112,karpenter:2022-02-07 08:51:01,312 INFO -- StandardAutoscaler: Queue 1 new nodes for launch
py38-cu112,karpenter:2022-02-07 08:51:01,316 INFO -- NodeLauncher0: Got 1 nodes to launch.
py38-cu112,karpenter:2022-02-07 08:51:01,316 INFO -- NodeLauncher0: Launching 1 nodes, type wkr-7cpu14g-spot.
py38-cu112,karpenter:2022-02-07 08:51:01,317 INFO -- KubernetesNodeProvider: calling create_namespaced_pod (count=1).
py38-cu112,karpenter:2022-02-07 08:51:01,393 INFO -- :event_summary:Adding 1 nodes of type wkr-7cpu14g-spot.
py38-cu112,karpenter:2022-02-07 08:51:01,394 DEBUG -- internal_kv_put b'__autoscaling_status' b'{"load_metrics_report": {"usage": {"object_store_memory": [626.0, 11019920178.0], "memory": [0.0, 27809913241.0], "node:": [0.0, 1.0], "node:": [0.0, 1.0], "CPU": [15.0, 15.0]}, "resource_demand": [[{"CPU": 1.0}, 1]], "pg_demand": [], "request_demand": [], "node_types": [[{"memory": 5261334937.0, "node:": 1.0, "object_store_memory": 2046035558.0}, 1], [{"CPU": 15.0, "object_store_memory": 8973884620.0, "node:": 1.0, "memory": 22548578304.0}, 1]], "head_ip": null}, "time": 1644252660.2818246, "monitor_pid": 857, "autoscaler_report": {"active_nodes": {"head": 1, "wkr-15cpu30g-ondemand": 1}, "pending_nodes": [], "pending_launches": {"wkr-7cpu14g-spot": 1}, "failed_nodes": []}}' True None
py38-cu112,karpenter:2022-02-07 08:51:06,412 INFO -- Logging raw resource message pulled from GCS.
py38-cu112,karpenter:2022-02-07 08:51:06,412 INFO -- batch {
node_id: "t\210\224\325\036\271B\311\227_\220x\326\327\246\371a\276\200alox\037&\326 \023"
resources_available {
key: "memory"
value: 22548578304.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 8973884620.0
resources_available_changed: true
resources_total {
key: "CPU"
value: 15.0
resources_total {
key: "memory"
value: 22548578304.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 8973884620.0
resource_load {
key: "CPU"
value: 1.0
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
node_manager_address: ""
batch {
node_id: "\215\257\374\262H\272\316\332\004\306\350\0005w\266\201\ra;\354\3736L5\240\321E\032"
resources_available {
key: "memory"
value: 5261334937.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 2046034932.0
resources_available_changed: true
resources_total {
key: "memory"
value: 5261334937.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 2046035558.0
resource_load_by_shape {
node_manager_address: ""
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
placement_group_load {
py38-cu112,karpenter:2022-02-07 08:51:06,413 INFO -- Done logging raw resource message.
py38-cu112,karpenter:2022-02-07 08:51:06,413 DEBUG -- internal_kv_get b'autoscaler_resource_request' None
py38-cu112,karpenter:2022-02-07 08:51:07,020 INFO --
======== Autoscaler status: 2022-02-07 08:51:07.020419 ========
Node status
1 head
1 wkr-15cpu30g-ondemand
None: wkr-7cpu14g-spot, uninitialized
Recent failures:
(no failures)
15.0/15.0 CPU
0.00/25.900 GiB memory
0.00/10.263 GiB object_store_memory
{'CPU': 1.0}: 1+ pending tasks/actors
py38-cu112,karpenter:2022-02-07 08:51:07,096 DEBUG -- internal_kv_put b'__autoscaling_status_legacy' b"Cluster status: 2 nodes\n - MostDelayedHeartbeats: {'': 0.607450008392334, '': 0.6073474884033203}\n - NodeIdleSeconds: Min=0 Mean=0 Max=0\n - ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory\n - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0\nWorker node types:\n - wkr-15cpu30g-ondemand: 1\n - wkr-7cpu14g-spot: 1" True None
py38-cu112,karpenter:2022-02-07 08:51:07,097 DEBUG -- Cluster status: 2 nodes
- MostDelayedHeartbeats: {'': 0.607450008392334, '': 0.6073474884033203}
- NodeIdleSeconds: Min=0 Mean=0 Max=0
- ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory
- TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
- wkr-15cpu30g-ondemand: 1
- wkr-7cpu14g-spot: 1
py38-cu112,karpenter:2022-02-07 08:51:07,271 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:07,320 DEBUG -- py38-cu112-wkr-7cpu14g--spot-xzdjg is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:07,360 DEBUG -- py38-cu112-wkr-7cpu14g--spot-xzdjg: Starting new thread runner.
py38-cu112,karpenter:2022-02-07 08:51:07,360 INFO -- Creating new (spawn_updater) updater thread for node py38-cu112-wkr-7cpu14g--spot-xzdjg.
py38-cu112,karpenter:2022-02-07 08:51:07,437 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:07,499 INFO -- NodeUpdater: py38-cu112-wkr-7cpu14g--spot-xzdjg: Running kubectl -n karpenter exec -it py38-cu112-wkr-7cpu14g--spot-xzdjg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
py38-cu112,karpenter:2022-02-07 08:51:07,711 DEBUG -- Cluster resources: [{'object_store_memory': 2046034932.0, 'memory': 5261334937.0, 'node:': 1.0}, {'object_store_memory': 8973884620.0, 'memory': 22548578304.0, 'node:': 1.0, 'CPU': 0.0}, {'CPU': 7, 'GPU': 0, 'memory': 10522669875}]
py38-cu112,karpenter:2022-02-07 08:51:07,711 DEBUG -- Node counts: defaultdict(<class 'int'>, {'head': 1, 'wkr-15cpu30g-ondemand': 1, 'wkr-7cpu14g-spot': 1})
py38-cu112,karpenter:2022-02-07 08:51:07,711 DEBUG -- Placement group demands: []
py38-cu112,karpenter:2022-02-07 08:51:07,711 DEBUG -- Resource demands: [{'CPU': 1.0}]
py38-cu112,karpenter:2022-02-07 08:51:07,711 DEBUG -- Unfulfilled demands: []
py38-cu112,karpenter:2022-02-07 08:51:07,711 DEBUG -- Final unfulfilled: []
py38-cu112,karpenter:2022-02-07 08:51:07,811 DEBUG -- Node requests: {}
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:51:07,931 DEBUG -- internal_kv_put b'__autoscaling_status' b'{"load_metrics_report": {"usage": {"memory": [0.0, 27809913241.0], "object_store_memory": [626.0, 11019920178.0], "node:": [0.0, 1.0], "node:": [0.0, 1.0], "CPU": [15.0, 15.0]}, "resource_demand": [[{"CPU": 1.0}, 1]], "pg_demand": [], "request_demand": [], "node_types": [[{"memory": 5261334937.0, "node:": 1.0, "object_store_memory": 2046035558.0}, 1], [{"CPU": 15.0, "object_store_memory": 8973884620.0, "node:": 1.0, "memory": 22548578304.0}, 1]], "head_ip": null}, "time": 1644252666.414767, "monitor_pid": 857, "autoscaler_report": {"active_nodes": {"head": 1, "wkr-15cpu30g-ondemand": 1}, "pending_nodes": [[null, "wkr-7cpu14g-spot", "waiting-for-ssh"]], "pending_launches": {}, "failed_nodes": []}}' True None
py38-cu112,karpenter:2022-02-07 08:51:12,931 INFO -- NodeUpdater: py38-cu112-wkr-7cpu14g--spot-xzdjg: Running kubectl -n karpenter exec -it py38-cu112-wkr-7cpu14g--spot-xzdjg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
py38-cu112,karpenter:2022-02-07 08:51:12,940 INFO -- Logging raw resource message pulled from GCS.
py38-cu112,karpenter:2022-02-07 08:51:12,940 INFO -- batch {
node_id: "t\210\224\325\036\271B\311\227_\220x\326\327\246\371a\276\200alox\037&\326 \023"
resources_available {
key: "memory"
value: 22548578304.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 8973884620.0
resources_available_changed: true
resources_total {
key: "CPU"
value: 15.0
resources_total {
key: "memory"
value: 22548578304.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 8973884620.0
resource_load {
key: "CPU"
value: 1.0
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
node_manager_address: ""
batch {
node_id: "\215\257\374\262H\272\316\332\004\306\350\0005w\266\201\ra;\354\3736L5\240\321E\032"
resources_available {
key: "memory"
value: 5261334937.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 2046034932.0
resources_available_changed: true
resources_total {
key: "memory"
value: 5261334937.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 2046035558.0
resource_load_by_shape {
node_manager_address: ""
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
placement_group_load {
py38-cu112,karpenter:2022-02-07 08:51:12,940 INFO -- Done logging raw resource message.
py38-cu112,karpenter:2022-02-07 08:51:12,942 DEBUG -- internal_kv_get b'autoscaler_resource_request' None
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:51:13,849 INFO --
======== Autoscaler status: 2022-02-07 08:51:13.848920 ========
Node status
1 head
1 wkr-15cpu30g-ondemand
None: wkr-7cpu14g-spot, waiting-for-ssh
Recent failures:
(no failures)
15.0/15.0 CPU
0.00/25.900 GiB memory
0.00/10.263 GiB object_store_memory
{'CPU': 1.0}: 1+ pending tasks/actors
py38-cu112,karpenter:2022-02-07 08:51:13,920 DEBUG -- internal_kv_put b'__autoscaling_status_legacy' b"Cluster status: 2 nodes (1 updating)\n - MostDelayedHeartbeats: {'': 0.907721757888794, '': 0.9073045253753662}\n - NodeIdleSeconds: Min=0 Mean=0 Max=0\n - ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory\n - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0\nWorker node types:\n - wkr-15cpu30g-ondemand: 1\n - wkr-7cpu14g-spot: 1" True None
py38-cu112,karpenter:2022-02-07 08:51:13,921 DEBUG -- Cluster status: 2 nodes (1 updating)
- MostDelayedHeartbeats: {'': 0.907721757888794, '': 0.9073045253753662}
- NodeIdleSeconds: Min=0 Mean=0 Max=0
- ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory
- TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
- wkr-15cpu30g-ondemand: 1
- wkr-7cpu14g-spot: 1
py38-cu112,karpenter:2022-02-07 08:51:14,099 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:14,166 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:14,398 DEBUG -- Cluster resources: [{'memory': 5261334937.0, 'object_store_memory': 2046034932.0, 'node:': 1.0}, {'memory': 22548578304.0, 'object_store_memory': 8973884620.0, 'node:': 1.0, 'CPU': 0.0}, {'CPU': 7, 'GPU': 0, 'memory': 10522669875}]
py38-cu112,karpenter:2022-02-07 08:51:14,398 DEBUG -- Node counts: defaultdict(<class 'int'>, {'head': 1, 'wkr-15cpu30g-ondemand': 1, 'wkr-7cpu14g-spot': 1})
py38-cu112,karpenter:2022-02-07 08:51:14,398 DEBUG -- Placement group demands: []
py38-cu112,karpenter:2022-02-07 08:51:14,398 DEBUG -- Resource demands: [{'CPU': 1.0}]
py38-cu112,karpenter:2022-02-07 08:51:14,399 DEBUG -- Unfulfilled demands: []
py38-cu112,karpenter:2022-02-07 08:51:14,399 DEBUG -- Final unfulfilled: []
py38-cu112,karpenter:2022-02-07 08:51:14,561 DEBUG -- Node requests: {}
py38-cu112,karpenter:2022-02-07 08:51:14,664 DEBUG -- internal_kv_put b'__autoscaling_status' b'{"load_metrics_report": {"usage": {"memory": [0.0, 27809913241.0], "object_store_memory": [626.0, 11019920178.0], "node:": [0.0, 1.0], "node:": [0.0, 1.0], "CPU": [15.0, 15.0]}, "resource_demand": [[{"CPU": 1.0}, 1]], "pg_demand": [], "request_demand": [], "node_types": [[{"memory": 5261334937.0, "node:": 1.0, "object_store_memory": 2046035558.0}, 1], [{"CPU": 15.0, "object_store_memory": 8973884620.0, "node:": 1.0, "memory": 22548578304.0}, 1]], "head_ip": null}, "time": 1644252672.9459927, "monitor_pid": 857, "autoscaler_report": {"active_nodes": {"head": 1, "wkr-15cpu30g-ondemand": 1}, "pending_nodes": [[null, "wkr-7cpu14g-spot", "waiting-for-ssh"]], "pending_launches": {}, "failed_nodes": []}}' True None
py38-cu112,karpenter:2022-02-07 08:51:18,119 INFO -- NodeUpdater: py38-cu112-wkr-7cpu14g--spot-xzdjg: Running kubectl -n karpenter exec -it py38-cu112-wkr-7cpu14g--spot-xzdjg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:51:19,672 INFO -- Logging raw resource message pulled from GCS.
py38-cu112,karpenter:2022-02-07 08:51:19,672 INFO -- batch {
node_id: "t\210\224\325\036\271B\311\227_\220x\326\327\246\371a\276\200alox\037&\326 \023"
resources_available {
key: "memory"
value: 22548578304.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 8973884620.0
resources_available_changed: true
resources_total {
key: "CPU"
value: 15.0
resources_total {
key: "memory"
value: 22548578304.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 8973884620.0
resource_load {
key: "CPU"
value: 1.0
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
node_manager_address: ""
batch {
node_id: "\215\257\374\262H\272\316\332\004\306\350\0005w\266\201\ra;\354\3736L5\240\321E\032"
resources_available {
key: "memory"
value: 5261334937.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 2046034932.0
resources_available_changed: true
resources_total {
key: "memory"
value: 5261334937.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 2046035558.0
resource_load_by_shape {
node_manager_address: ""
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
placement_group_load {
py38-cu112,karpenter:2022-02-07 08:51:19,672 INFO -- Done logging raw resource message.
py38-cu112,karpenter:2022-02-07 08:51:19,673 DEBUG -- internal_kv_get b'autoscaler_resource_request' None
py38-cu112,karpenter:2022-02-07 08:51:20,448 INFO --
======== Autoscaler status: 2022-02-07 08:51:20.448769 ========
Node status
1 head
1 wkr-15cpu30g-ondemand
None: wkr-7cpu14g-spot, waiting-for-ssh
Recent failures:
(no failures)
15.0/15.0 CPU
0.00/25.900 GiB memory
0.00/10.263 GiB object_store_memory
{'CPU': 1.0}: 1+ pending tasks/actors
py38-cu112,karpenter:2022-02-07 08:51:20,510 DEBUG -- internal_kv_put b'__autoscaling_status_legacy' b"Cluster status: 2 nodes (1 updating)\n - MostDelayedHeartbeats: {'': 0.7758562564849854, '': 0.7755496501922607}\n - NodeIdleSeconds: Min=0 Mean=0 Max=0\n - ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory\n - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0\nWorker node types:\n - wkr-15cpu30g-ondemand: 1\n - wkr-7cpu14g-spot: 1" True None
py38-cu112,karpenter:2022-02-07 08:51:20,511 DEBUG -- Cluster status: 2 nodes (1 updating)
- MostDelayedHeartbeats: {'': 0.7758562564849854, '': 0.7755496501922607}
- NodeIdleSeconds: Min=0 Mean=0 Max=0
- ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory
- TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
- wkr-15cpu30g-ondemand: 1
- wkr-7cpu14g-spot: 1
py38-cu112,karpenter:2022-02-07 08:51:20,661 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:20,716 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:20,966 DEBUG -- Cluster resources: [{'memory': 5261334937.0, 'object_store_memory': 2046034932.0, 'node:': 1.0}, {'memory': 22548578304.0, 'node:': 1.0, 'object_store_memory': 8973884620.0, 'CPU': 0.0}, {'CPU': 7, 'GPU': 0, 'memory': 10522669875}]
py38-cu112,karpenter:2022-02-07 08:51:20,966 DEBUG -- Node counts: defaultdict(<class 'int'>, {'head': 1, 'wkr-15cpu30g-ondemand': 1, 'wkr-7cpu14g-spot': 1})
py38-cu112,karpenter:2022-02-07 08:51:20,966 DEBUG -- Placement group demands: []
py38-cu112,karpenter:2022-02-07 08:51:20,967 DEBUG -- Resource demands: [{'CPU': 1.0}]
py38-cu112,karpenter:2022-02-07 08:51:20,967 DEBUG -- Unfulfilled demands: []
py38-cu112,karpenter:2022-02-07 08:51:20,967 DEBUG -- Final unfulfilled: []
py38-cu112,karpenter:2022-02-07 08:51:21,060 DEBUG -- Node requests: {}
py38-cu112,karpenter:2022-02-07 08:51:21,147 DEBUG -- internal_kv_put b'__autoscaling_status' b'{"load_metrics_report": {"usage": {"memory": [0.0, 27809913241.0], "object_store_memory": [626.0, 11019920178.0], "node:": [0.0, 1.0], "CPU": [15.0, 15.0], "node:": [0.0, 1.0]}, "resource_demand": [[{"CPU": 1.0}, 1]], "pg_demand": [], "request_demand": [], "node_types": [[{"memory": 5261334937.0, "node:": 1.0, "object_store_memory": 2046035558.0}, 1], [{"CPU": 15.0, "object_store_memory": 8973884620.0, "node:": 1.0, "memory": 22548578304.0}, 1]], "head_ip": null}, "time": 1644252679.6797361, "monitor_pid": 857, "autoscaler_report": {"active_nodes": {"head": 1, "wkr-15cpu30g-ondemand": 1}, "pending_nodes": [[null, "wkr-7cpu14g-spot", "waiting-for-ssh"]], "pending_launches": {}, "failed_nodes": []}}' True None
py38-cu112,karpenter:2022-02-07 08:51:23,284 INFO -- NodeUpdater: py38-cu112-wkr-7cpu14g--spot-xzdjg: Running kubectl -n karpenter exec -it py38-cu112-wkr-7cpu14g--spot-xzdjg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
py38-cu112,karpenter:2022-02-07 08:51:26,156 INFO -- Logging raw resource message pulled from GCS.
py38-cu112,karpenter:2022-02-07 08:51:26,156 INFO -- batch {
node_id: "t\210\224\325\036\271B\311\227_\220x\326\327\246\371a\276\200alox\037&\326 \023"
resources_available {
key: "memory"
value: 22548578304.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 8973884620.0
resources_available_changed: true
resources_total {
key: "CPU"
value: 15.0
resources_total {
key: "memory"
value: 22548578304.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 8973884620.0
resource_load {
key: "CPU"
value: 1.0
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
node_manager_address: ""
batch {
node_id: "\215\257\374\262H\272\316\332\004\306\350\0005w\266\201\ra;\354\3736L5\240\321E\032"
resources_available {
key: "memory"
value: 5261334937.0
resources_available {
key: "node:"
value: 1.0
resources_available {
key: "object_store_memory"
value: 2046034932.0
resources_available_changed: true
resources_total {
key: "memory"
value: 5261334937.0
resources_total {
key: "node:"
value: 1.0
resources_total {
key: "object_store_memory"
value: 2046035558.0
resource_load_by_shape {
node_manager_address: ""
resource_load_by_shape {
resource_demands {
shape {
key: "CPU"
value: 1.0
num_ready_requests_queued: 1
placement_group_load {
py38-cu112,karpenter:2022-02-07 08:51:26,156 INFO -- Done logging raw resource message.
py38-cu112,karpenter:2022-02-07 08:51:26,157 DEBUG -- internal_kv_get b'autoscaler_resource_request' None
py38-cu112,karpenter:2022-02-07 08:51:26,651 INFO --
======== Autoscaler status: 2022-02-07 08:51:26.651839 ========
Node status
1 head
1 wkr-15cpu30g-ondemand
None: wkr-7cpu14g-spot, waiting-for-ssh
Recent failures:
(no failures)
15.0/15.0 CPU
0.00/25.900 GiB memory
0.00/10.263 GiB object_store_memory
{'CPU': 1.0}: 1+ pending tasks/actors
py38-cu112,karpenter:2022-02-07 08:51:26,728 DEBUG -- internal_kv_put b'__autoscaling_status_legacy' b"Cluster status: 2 nodes (1 updating)\n - MostDelayedHeartbeats: {'': 0.49486541748046875, '': 0.49477195739746094}\n - NodeIdleSeconds: Min=0 Mean=0 Max=0\n - ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory\n - TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0\nWorker node types:\n - wkr-15cpu30g-ondemand: 1\n - wkr-7cpu14g-spot: 1" True None
py38-cu112,karpenter:2022-02-07 08:51:26,729 DEBUG -- Cluster status: 2 nodes (1 updating)
- MostDelayedHeartbeats: {'': 0.49486541748046875, '': 0.49477195739746094}
- NodeIdleSeconds: Min=0 Mean=0 Max=0
- ResourceUsage: 15.0/15.0 CPU, 0.0 GiB/25.9 GiB memory, 0.0 GiB/10.26 GiB object_store_memory
- TimeSinceLastHeartbeat: Min=0 Mean=0 Max=0
Worker node types:
- wkr-15cpu30g-ondemand: 1
- wkr-7cpu14g-spot: 1
py38-cu112,karpenter:2022-02-07 08:51:26,910 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:26,969 DEBUG -- py38-cu112-wkr-15cpu30g--ondemand-vxdvq is not being updated and passes config check (can_update=True).
py38-cu112,karpenter:2022-02-07 08:51:27,191 DEBUG -- Cluster resources: [{'node:': 1.0, 'memory': 5261334937.0, 'object_store_memory': 2046034932.0}, {'object_store_memory': 8973884620.0, 'memory': 22548578304.0, 'node:': 1.0, 'CPU': 0.0}, {'CPU': 7, 'GPU': 0, 'memory': 10522669875}]
py38-cu112,karpenter:2022-02-07 08:51:27,191 DEBUG -- Node counts: defaultdict(<class 'int'>, {'head': 1, 'wkr-15cpu30g-ondemand': 1, 'wkr-7cpu14g-spot': 1})
py38-cu112,karpenter:2022-02-07 08:51:27,191 DEBUG -- Placement group demands: []
py38-cu112,karpenter:2022-02-07 08:51:27,192 DEBUG -- Resource demands: [{'CPU': 1.0}]
py38-cu112,karpenter:2022-02-07 08:51:27,192 DEBUG -- Unfulfilled demands: []
py38-cu112,karpenter:2022-02-07 08:51:27,192 DEBUG -- Final unfulfilled: []
py38-cu112,karpenter:2022-02-07 08:51:27,334 DEBUG -- Node requests: {}
py38-cu112,karpenter:2022-02-07 08:51:27,460 DEBUG -- internal_kv_put b'__autoscaling_status' b'{"load_metrics_report": {"usage": {"node:": [0.0, 1.0], "object_store_memory": [626.0, 11019920178.0], "memory": [0.0, 27809913241.0], "node:": [0.0, 1.0], "CPU": [15.0, 15.0]}, "resource_demand": [[{"CPU": 1.0}, 1]], "pg_demand": [], "request_demand": [], "node_types": [[{"memory": 5261334937.0, "node:": 1.0, "object_store_memory": 2046035558.0}, 1], [{"CPU": 15.0, "object_store_memory": 8973884620.0, "node:": 1.0, "memory": 22548578304.0}, 1]], "head_ip": null}, "time": 1644252686.1586945, "monitor_pid": 857, "autoscaler_report": {"active_nodes": {"head": 1, "wkr-15cpu30g-ondemand": 1}, "pending_nodes": [[null, "wkr-7cpu14g-spot", "waiting-for-ssh"]], "pending_launches": {}, "failed_nodes": []}}' True None
py38-cu112,karpenter:2022-02-07 08:51:28,455 INFO -- NodeUpdater: py38-cu112-wkr-7cpu14g--spot-xzdjg: Running kubectl -n karpenter exec -it py38-cu112-wkr-7cpu14g--spot-xzdjg -- bash --login -c -i 'true && source ~/.bashrc && export OMP_NUM_THREADS=1 PYTHONWARNINGS=ignore && (uptime)'
Unable to use a TTY - input is not a terminal or the right kind of file
Error from server: no preferred addresses found; known addresses: []
Name: ray-operator-5776ff876d-5xqcz
Namespace: karpenter
Priority: 0
Start Time: Mon, 07 Feb 2022 10:01:55 -0600
Annotations: eks.privileged
Status: Running
Controlled By: ReplicaSet/ray-operator-5776ff876d
Container ID: docker://201a6272612f771c4669e8b9da76964a9d9fe3a5de29e4c05c9a6eb9ea809e14
Image: rayproject/ray:6235b6
Image ID: docker-pullable://rayproject/ray@sha256:e788f73e8a585426acb186bfb64b4d85a083e19a47e3305ae1dc036b6c32ed05
Port: <none>
Host Port: <none>
State: Running
Started: Mon, 07 Feb 2022 10:03:02 -0600
Ready: True
Restart Count: 0
cpu: 1
memory: 2Gi
cpu: 1
ephemeral-storage: 1Gi
memory: 1Gi
/var/run/secrets/ from kube-api-access-7wvdp (ro)
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: op=Exists for 300s op=Exists for 300s
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 55m default-scheduler Successfully assigned karpenter/ray-operator-5776ff876d-5xqcz to
Normal Pulling 55m kubelet Pulling image "rayproject/ray:6235b6"
Normal Pulled 54m kubelet Successfully pulled image "rayproject/ray:6235b6" in 50.154638498s
Normal Created 54m kubelet Created container ray
Normal Started 54m kubelet Started container ray
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment