| theme | title | info | transition | mdc |
|---|---|---|---|---|
default |
Multi-Cluster Kubernetes: Problems & Solutions |
OCM, Sveltos, and the road to a standard API
Guilhem Lettron — SRE France Meetup
|
slide-left |
true |
Guilhem Lettron — SRE France Meetup
You already have one cluster. Why would you want many?
- Blast radius — isolate failure domains (prod / staging / per-team)
- Regulatory / data sovereignty — data must stay in a given region
- Edge & hybrid — workloads close to users or on-prem constraints
- Scaling limits — etcd / API server throughput ceiling
- Organizational — different teams, different upgrade cadences
Multi-cluster is not a choice. It's an inevitability.
You provision clusters with Cluster API, Terraform, cloud consoles…
But then:
- No single inventory of all clusters
- No standard way to describe cluster properties (version, region, labels)
- No health status aggregation
- Each tool has its own registration mechanism
"We have 17 clusters… I think. Let me check three different dashboards."
You need the same stack on every cluster:
- CNI, CSI, cert-manager, monitoring, policies…
- But with per-cluster variations (cloud provider, sizing, feature flags)
- Drift happens: someone
kubectl applys in prod - Ordering matters: CRDs before controllers, Istio before apps
Manual Helm installs x N clusters = chaos
In many setups, managed clusters:
- Are behind NAT / firewalls (edge, on-prem)
- Have no direct inbound connectivity from the hub
- Require mTLS, short-lived tokens, certificate rotation
"Just expose the kubeconfig" is not an option in production.
flowchart LR
dev[You] -->|git push| repo["Git Repo"]
repo -->|pull| flux["Flux / ArgoCD"]
flux -->|apply| c1[Cluster 1]
flux -->|apply| c2[Cluster 2]
flux -->|apply| c3[Cluster ...]
flux -->|apply| cn[Cluster N]
Problem solved. Thank you, good night.
flowchart LR
fresh["Fresh Cluster<br/>(nothing running)"] -.-x flux["Flux / ArgoCD<br/>???"]
flux -->|pull| repo[Git Repo]
style fresh fill:#ff6b6b,color:#fff
style flux fill:#ff6b6b,color:#fff,stroke-dasharray: 5 5
- Flux / ArgoCD must already be running on the cluster to reconcile
- Chicken-and-egg: you need a tool to deploy the tool
flux bootstrap/argocd installis... imperative- You still need a push-based mechanism for day-0
Your "pull-only" workflow starts with a push.
graph TB
repo[Git Repo] -->|"1 branch<br/>1 folder"| boom["All clusters<br/>at once"]
repo -->|"N branches"| branches["branch-cluster-1<br/>branch-cluster-2<br/>branch-cluster-3<br/>...branch-cluster-47"]
repo -->|"N folders"| folders["clusters/prod-eu-1/<br/>clusters/prod-eu-2/<br/>clusters/prod-us-1/<br/>...clusters/staging-ap-3/"]
style boom fill:#ff6b6b,color:#fff
style branches fill:#f5a623,color:#fff
style folders fill:#f5a623,color:#fff
- 1 repo + 1 branch = one merge hits every cluster simultaneously
- N branches / N folders = you're back to managing things one by one
- Progressive rollout? Write custom CI pipelines to promote between folders
- 30 clusters x 15 addons = 450 files to maintain — that's not automation
flowchart LR
dev[Developer] -->|git push| repo[Git Repo]
repo -->|pull| agent1[Agent<br/>Cluster 1]
repo -->|pull| agent2[Agent<br/>Cluster 2]
repo -->|pull| agentN[Agent<br/>Cluster N]
agent1 -->|apply| c1[Cluster 1]
agent2 -->|apply| c2[Cluster 2]
agentN -->|apply| cN[Cluster N]
c1 -.->|"???"| dev
c2 -.->|"???"| dev
cN -.->|"???"| dev
- Git is write-only — no native status feedback channel
- "I merged, did it deploy?" → check N dashboards /
kubectlon N clusters - Errors buried in controller logs, not in your PR
- No aggregated status across the fleet — "deployed to 28/30 clusters" does not exist
GitOps solves single-cluster continuous delivery.
Multi-cluster needs something more:
| Need | GitOps | What's missing |
|---|---|---|
| Bootstrap | flux bootstrap (imperative) |
Push-based day-0 |
| Blast radius | N folders / branches | Native progressive rollout |
| Feedback | Per-cluster logs | Aggregated fleet status |
| Cluster inventory | Manual cluster list | Dynamic discovery |
GitOps solves delivery. Multi-cluster needs orchestration.
| Concern | Tool |
|---|---|
| Cluster lifecycle | Cluster API, Terraform, cloud CLIs |
| Cluster inventory & registration | OCM, Rancher, Fleet |
| Addon deployment | Sveltos, Flux, ArgoCD |
| Workload scheduling | Karmada, KubeFleet, ArgoCD |
| Standard cluster API | SIG Multicluster ClusterProfile |
None of these tools does everything. Composability is the goal.
graph TB
subgraph Hub["Hub Cluster"]
MC[ManagedCluster API]
PL[Placement API]
MW[ManifestWork]
AD[Addon Framework]
end
subgraph SpokeA["Managed Cluster A"]
KA[klusterlet agent]
end
subgraph SpokeB["Managed Cluster B"]
KB[klusterlet agent]
end
KA -->|registers| MC
KB -->|registers| MC
PL -->|selects| MC
AD -->|deploys via| MW
MW -->|applied by| KA
MW -->|applied by| KB
The ManagedCluster resource on the hub:
apiVersion: cluster.open-cluster-management.io/v1
kind: ManagedCluster
metadata:
name: cluster-eu-west-1
labels:
cloud: aws
region: eu-west-1
env: production
status:
conditions:
- type: ManagedClusterConditionAvailable
status: "True"
version:
kubernetes: v1.30.2- Auto-registered by the klusterlet agent
- Labels → queryable inventory
- Conditions → aggregated health
"Deploy this to all production clusters in EU"
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
name: eu-prod
spec:
predicates:
- requiredClusterSelector:
labelSelector:
matchLabels:
env: production
region: eu-west-1→ Produces a PlacementDecision with the list of matching clusters.
Used by addons, policies, and workload controllers.
An OCM addon is not a Helm chart. It's a framework:
ClusterManagementAddOn— defines the addon globallyManagedClusterAddOn— enables it per cluster- Addon controller runs on the hub, deploys agents to spokes via
ManifestWork
Examples of OCM addons:
cluster-proxy— reverse tunnel for hub → spoke connectivitymanaged-serviceaccount— automated token lifecyclesveltos-ocm-addon— bridges OCM and Sveltos
ClusterProfile: declare what to deploy and where
apiVersion: config.projectsveltos.io/v1beta1
kind: ClusterProfile
metadata:
name: monitoring-stack
spec:
clusterSelector:
matchLabels:
env: production
helmCharts:
- repositoryURL: https://prometheus-community.github.io/helm-charts
chartName: kube-prometheus-stack
chartVersion: "65.1.0"
releaseName: monitoring
releaseNamespace: monitoring
values: |
grafana:
enabled: true
syncMode: ContinuousWithDriftDetection| Feature | What it does |
|---|---|
| Drift detection | Detects & auto-corrects config drift on managed clusters |
| Templating | Values from management or managed cluster (Go templates) |
| Deployment order | Sequential within a profile, dependencies between profiles |
| Progressive rollout | Phased rollout across cluster groups |
| Multi-tenancy | Profile (namespaced) vs ClusterProfile (cluster-wide) |
| Tier / conflict resolution | When two profiles target the same resource, tier wins |
Sveltos watches SveltosCluster resources:
apiVersion: lib.projectsveltos.io/v1beta1
kind: SveltosCluster
metadata:
name: cluster-eu-west-1
namespace: default
labels:
cloud: aws
region: eu-west-1
spec:
kubeconfigKeyName: kubeconfigProblem: who creates these SveltosCluster objects?
- Manually? Doesn't scale.
- From Cluster API? Works, but only if you use CAPI.
- From OCM? That's where sveltos-ocm-addon comes in.
Automatically registers OCM managed clusters as Sveltos clusters
flowchart LR
MC[OCM ManagedCluster] --> addon[sveltos-ocm-addon]
CP[cluster-proxy<br/>kubeconfig via tunnel] --> addon
addon --> SC[SveltosCluster]
ManagedClusterAddOndeployed to selected clusters (via Placement)- Controller creates a
ManagedServiceAccount→ gets a token - Builds a kubeconfig routed through cluster-proxy
- Creates the
SveltosClusteron the hub with synced labels
→ Sveltos immediately picks up the new cluster and deploys matching profiles.
flowchart LR
subgraph Hub["Hub Cluster"]
MC[ManagedCluster] --> MCA[ManagedClusterAddOn]
MCA --> MSA[ManagedServiceAccount]
MSA --> ctrl[sveltos-ocm-addon<br/>controller]
MC --> ctrl
ctrl --> SC[SveltosCluster]
SC --> sveltos[Sveltos<br/>addon-controller]
ctrl <--> proxy[cluster-proxy<br/>tunnel]
end
subgraph A["Managed Cluster A"]
KA[klusterlet]
end
subgraph B["Managed Cluster B"]
KB[klusterlet]
end
proxy <-.-> KA
proxy <-.-> KB
sveltos -.->|deploy addons| A
sveltos -.->|deploy addons| B
flowchart LR
A["1. Register<br/>cluster with OCM"] --> B["2. Placement<br/>selects clusters"]
B --> C["3. sveltos-ocm-addon<br/>creates SveltosCluster"]
C --> D["4. ClusterProfile<br/>matches & deploys"]
D --> E["5. Drift detection<br/>keeps state"]
style A fill:#4a9eff,color:#fff
style B fill:#4a9eff,color:#fff
style C fill:#f5a623,color:#fff
style D fill:#7ed321,color:#fff
style E fill:#7ed321,color:#fff
Each layer does one thing well. Composability > monolith.
Today every project reinvents the cluster object:
| Project | Cluster resource |
|---|---|
| OCM | ManagedCluster |
| Cluster API | Cluster |
| Sveltos | SveltosCluster |
| Karmada | Cluster |
| Rancher | clusters.management.cattle.io |
→ No interoperability. Bridges everywhere. Sound familiar?
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ClusterProfile
metadata:
name: cluster-eu-west-1
namespace: fleet-inventory
spec:
displayName: "EU West Production"
clusterManager:
name: ocm
status:
version:
kubernetes: v1.30.2
properties:
- name: region
value: eu-west-1
conditions:
- type: ControlPlaneHealthy
status: "True"- Namespace-scoped (multiple inventories on one hub)
- Cluster Manager creates and updates status
- Consumers read for scheduling / placement decisions
flowchart LR
CAPI[Cluster API] --> CP["ClusterProfile<br/>(standard API)"]
OCM[OCM] --> CP
Karmada --> CP
CP --> Sveltos
CP --> ArgoCD
CP --> Flux
- Cluster managers (OCM, CAPI, Karmada) populate
ClusterProfile - Consumers (Sveltos, Argo, Flux) read
ClusterProfileto discover targets - No more per-project bridges
The bridge I wrote (sveltos-ocm-addon) should eventually become unnecessary.
We can now:
- Discover clusters (OCM)
- Deploy addons consistently (Sveltos)
- Communicate securely through tunnels (cluster-proxy)
But what about applications?
- Where should this workload run? The cluster with the most available capacity?
- How do I spread replicas across failure domains?
- How do I do a progressive rollout across the fleet?
- What if a cluster goes down mid-rollout?
Addons = same everywhere. Apps = smart placement.
flowchart TB
subgraph Hub["Karmada Control Plane"]
D[Deployment] --> PP[PropagationPolicy]
PP --> OP[OverridePolicy<br/>per-cluster values]
end
PP -->|"spread: 3 clusters<br/>weighted by capacity"| C1[Cluster EU<br/>4 replicas]
PP --> C2[Cluster US<br/>6 replicas]
PP --> C3[Cluster AP<br/>2 replicas]
- PropagationPolicy — where to schedule, how many replicas per cluster
- OverridePolicy — per-cluster customizations (image registry, resource limits)
- Replica scheduling — distribute by capacity (DynamicWeight), region, cost
- Failover — auto-migrate replicas when a cluster fails
flowchart LR
subgraph Hub["KubeFleet Hub"]
CRP[ClusterResourcePlacement]
SCH[Scheduler<br/>capacity + affinity + topology]
end
CRP --> SCH
SCH -->|"pick best N"| M1[Member Cluster 1]
SCH --> M2[Member Cluster 2]
SCH --> M3[Member Cluster 3]
M1 -.->|status| Hub
M2 -.->|status| Hub
M3 -.->|status| Hub
- Hub-spoke (agent-initiated, like OCM — works behind NAT)
- Scheduler plugins — capacity, affinity, topology spread, cost, GPU
- Progressive rollout — staged updates with health checks at each step
- Status aggregation — fleet-wide deployment status on the hub
flowchart TB
subgraph Lifecycle["Cluster Lifecycle"]
CAPI[Cluster API / Terraform]
end
subgraph Inventory["Cluster Inventory"]
OCM["OCM<br/>registration + discovery"]
end
subgraph Addons["Addon Management"]
SV["Sveltos<br/>CNI, monitoring, policies..."]
end
subgraph Apps["App Scheduling"]
KR["Karmada / KubeFleet<br/>intelligent placement"]
end
CAPI --> OCM
OCM --> SV
OCM --> KR
style Lifecycle fill:#8e8e8e,color:#fff
style Inventory fill:#4a9eff,color:#fff
style Addons fill:#7ed321,color:#fff
style Apps fill:#bd10e0,color:#fff
Each layer solves one concern. Pick the tools that fit your needs.
- Multi-cluster is inevitable — plan for it early
- GitOps alone is not enough — bootstrap, blast radius, feedback are unsolved
- Separate concerns: inventory (OCM) vs addons (Sveltos) vs app scheduling (Karmada/KubeFleet)
- Composability wins — no single tool does everything well
- Bridges are necessary today — sveltos-ocm-addon connects OCM ↔ Sveltos
- SIG Multicluster ClusterProfile is the upstream path to eliminate bridges
- sveltos-ocm-addon: github.com/guilhem/sveltos-ocm-addon
- OCM: open-cluster-management.io
- Sveltos: projectsveltos.github.io/sveltos
- Karmada: karmada.io
- KubeFleet: github.com/kubefleet-dev/kubefleet
- SIG Multicluster: multicluster.sigs.k8s.io
- KEP-4322 (Cluster Inventory): github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/4322-cluster-inventory
@guilhem — github.com/guilhem