SkyRay: Seamlessly Extending KubeRay to Multi-Cluster Multi-Cloud Operation - Anne Holler, Elotl

Introduction to SkyRay and Sky Computing

SkyRay is an extension of KubeRay, aiming to seamlessly extend its operation from a single cluster environment to a multi-cluster, multi-cloud operation (00:00:16).
The idea of SkyRay is based on the concept of Sky Computing, which requires a commodity Cloud compute layer, making it easy to use multiple clusters as it is to use one (00:02:42).
SkyRay works with a policy-driven Kubernetes fleet manager, which presents a Kubernetes API to the user and interoperates with KubeRay (00:03:04).
The fleet manager schedules KubeRay deployments on workload clusters according to a policy, and KubeRay handles the deployments on each cluster (00:05:43).

Analysis of variance fleet manager is used in SkyRay, which supports policies such as spread duplicate, specified cluster, priority, and available capacity (00:05:51).
SkyRay can be used to achieve various policy objectives, such as training and serving workloads, with examples available in an open-source repository (00:08:27).
SkyRay allows users to run jobs on a static cluster, and if the job doesn't fit, it can be scheduled on a dynamic cluster with on-demand resources to handle the job, using the available capacity policy (00:10:11).
For experimental jobs, users can set up a cluster with a specified cluster placement policy, which allows for easy rescheduling and future scheduling on a different cluster if needed (00:11:55).

SkyRay also supports serving for production and development, allowing users to create static clusters for production and dynamic clusters for development, with different GPU instances and auto-scaling policies (00:13:30).
A priority policy can be used to schedule workloads based on cloud provider, allowing users to prioritize certain cloud providers over others (00:15:36).

SkyRay's just-in-time capability in standby mode allows clusters to scale to zero when idle, reducing costs, and can be used with CU to handle Ray jobs and services (00:16:13).
SkyRay can facilitate Kubernetes upgrades with no downtime to AI workloads by spreading duplicate workloads across labeled clusters and cloning clusters with the new Kubernetes version (00:18:47).

SkyRay extends KubeRay for multi-cluster, multi-cloud operation, allowing for seamless deployment and management of clusters across different cloud providers (00:19:33).
The delete-recreate version of just-in-time clusters involves labeling a cluster, duplicating the workload, deploying the Ray service, and ensuring it serves before switching the load balancer and deleting the old cluster (00:19:56).

A compound AI example is demonstrated, featuring an Large language model plus retrieval-augmented generation, with one cluster dedicated to serving the LLM and another for ingestion, both utilizing GPU and CPU resources respectively (00:20:46).
The clusters are managed using labels and policies, allowing for efficient scaling and resource allocation, and can be easily deployed and managed with the right policies in place (00:22:07).

SkyRay aims to reduce launch time, increase efficiency, manage costs, enhance robustness, and facilitate cluster maintenance, and is available as an open-source solution for users to try (00:22:51).