Skip to content

Instantly share code, notes, and snippets.

View jackfrancis's full-sized avatar

Jack Francis jackfrancis

View GitHub Profile

Here's what Cluster Autoscaler will do in your scenario, end‑to‑end. The behavior is entirely driven by the fact that the Kubernetes Node objects still exist (kubelet stops, but the API server retains them) and the Azure cloud provider does not surface VM power state to CA.

How CA classifies the 33 stopped nodes

CA buckets every node every loop in clusterstate.go (updateReadinessStats). The buckets are mutually exclusive:

@jackfrancis
jackfrancis / ray-azure-agent.md
Created March 5, 2026 19:48
Ray Azure Provider plan

Agent Workflow: Reinforce azure support

Auto-generated by providerize on 2026-02-24 22:56 UTC Repository: https://github.com/ray-project/ray

Context

This repository has cloud provider integrations for: aws, gcp, azure.

Current ranking (best → worst):

@jackfrancis
jackfrancis / kuberay-multicluster.md
Created March 4, 2026 19:33
KubeRay Multicluster analysis

KubeRay Federation Proposal — Analysis & Comparative Landscape

Proposal Summary

Issue #4561 by @yuchen-ecnu proposes adding federation capability to KubeRay so that a single logical RayCluster can span multiple Kubernetes clusters. The core motivation:

  • Fragmented GPUs: Organizations procure GPUs across multiple cloud vendors/AZs. Today these are isolated into separate K8s clusters, preventing a unified Ray cluster.
  • Operational pain: Users must split datasets, deploy multiple small RayClusters, and manually manage them — causing long-tail performance issues and complexity.
  • Virtual Kubelet limitations: The common workaround (aggregating via Virtual Kubelet) creates control-plane scalability bottlenecks, especially at scale (e.g., 10K→400K+ cores in an hour).
@jackfrancis
jackfrancis / gist:9ecc5964c1b4f2af10894d7dfac9368c
Created January 28, 2026 16:24
jack-francis-talks-kubecon-eu-2026
Here are my two talks:
The first one will be a joint session w/ GKE (and fellow SIG Autoscaling TL) describing the OSS CA and our plans to make significant changes in 2026.
The tl;dr is that GOOG will be investing significantly in the OSS CA to correct a decade of forking and blackbox dev.
This is directly due to investments that we have made in the SIG to modernize the OSS surface area, release process, E2E tests, etc.
- https://maintainersummiteu2026.sched.com/event/2EWf1/cluster-autoscaler-evolution-kuba-tuznik-google-jack-francis-microsoft
@jackfrancis
jackfrancis / tilt-settings.json
Created February 18, 2022 01:10
tilt-settings
{
"kustomize_substitutions": {
"AZURE_SUBSCRIPTION_ID": "<sub id here>",
"AZURE_TENANT_ID": "<tenant id here>",
"AZURE_CLIENT_SECRET": "<sp pw here>",
"AZURE_CLIENT_ID": "<sp id here>",
"AZURE_ENVIRONMENT": "AzurePublicCloud",
"AZURE_SSH_PUBLIC_KEY_B64": "<ssh public key here>"
},
"worker-templates": {
@jackfrancis
jackfrancis / aks-large-cluster.sh
Last active January 18, 2023 18:42
Build large AKS cluster
#!/bin/bash
if [ -z "$RESOURCE_GROUP" ]; then
echo "must provide a RESOURCE_GROUP env var"
exit 1;
fi
if [ -z "$REGION" ]; then
echo "must provide a REGION env var"
exit 1;
@jackfrancis
jackfrancis / vmss-health-check.sh
Last active September 13, 2021 16:54
vmss-health-check.sh
#!/bin/bash
if [ -z "$RESOURCE_GROUP" ]; then
echo "must provide a RESOURCE_GROUP env var"
exit 1;
fi
# Continually look for non-Succeeded VMSS instances
while true; do
NUM_VMSS=0
@jackfrancis
jackfrancis / init.sh
Last active September 27, 2021 20:21
macOS init
#!/bin/bash
if [ -z "$GITHUB_USERNAME" ]; then
echo "Must set the GITHUB_USERNAME variable"
exit 1
fi
if [ -z "$GITHUB_REPOS" ]; then
# You may set up a bash string array of github repositories for which your GITHUB_USERNAME has a fork, e.g.:
export GITHUB_REPOS="https://github.com/kubernetes-sigs/cluster-api-provider-azure.git https://github.com/Azure/aks-engine.git \
https://github.com/weaveworks/kured.git https://github.com/kubernetes-sigs/cluster-api.git https://github.com/kubernetes/perf-tests.git \
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.416617 16916 flags.go:59] FLAG: --add-dir-header="false"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417643 16916 flags.go:59] FLAG: --address="0.0.0.0"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417660 16916 flags.go:59] FLAG: --allowed-unsafe-sysctls="[]"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417676 16916 flags.go:59] FLAG: --alsologtostderr="false"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417683 16916 flags.go:59] FLAG: --anonymous-auth="false"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417691 16916 flags.go:59] FLAG: --application-metrics-count-limit="100"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417698 16916 flags.go:59] FLAG: --authentication-token-webhook="true"
Mar 10 02:12:15 k8s-master-30342830-0 kubelet[16916]: I0310 02:12:15.417704 16916 flags.go:59] FL
/*
Copyright 2017 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software