Skip to content

Instantly share code, notes, and snippets.

@prashantv
Created March 23, 2026 20:44
Show Gist options
  • Select an option

  • Save prashantv/24aa4bd7caa83bded3d283a2756665c5 to your computer and use it in GitHub Desktop.

Select an option

Save prashantv/24aa4bd7caa83bded3d283a2756665c5 to your computer and use it in GitHub Desktop.

ipam.mode: azure assumes a single subscription and resource group

Problem

We run EKS Hybrid clusters with Azure hybrid nodes spanning multiple Azure regions. Each region has multiple subscriptions:

  • A sharedinfra subscription hosting a shared VNet (using Azure's cross-subscription subnet sharing / EnableSharedVNet feature)
  • One or more compute subscriptions where VMs and NICs are created, referencing the shared VNet's subnet cross-subscription

A single Kubernetes cluster has nodes across 3 regions (centralus, eastus, southeastus3), each with its own shared VNet and 2-4 compute subscriptions. All told, a cluster may have nodes across 10+ subscriptions and separate resource groups.

The cilium operator's Azure IPAM implementation creates a single Azure API client at startup (pkg/azure/api/api.go:NewClient) scoped to one subscriptionID and one resourceGroup. All NIC discovery (GetInstances) and NIC updates (adding/removing secondary IPs) go through this single client. The operator cannot discover or manage NICs in other subscriptions or resource groups.

This means ipam.mode: azure only works when every node's NIC is in the same subscription and resource group, which doesn't hold for multi-region or multi-subscription deployments.

What would need to change

The operator would need to resolve the subscription and resource group per-node rather than using a global default. The information is available — each node's providerID contains the full ARM resource ID (e.g. /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Compute/virtualMachines/<name>), and the NIC's ARM ID similarly encodes its subscription and RG. The operator would need to:

  1. Parse the subscription and resource group from each node's provider ID or NIC reference
  2. Maintain Azure API clients per subscription (or use a single credential with cross-subscription access)
  3. Scope NIC list/update operations to the correct subscription and resource group for each node

Our workaround

Since we can't use ipam.mode: azure, we're evaluating ipam: multi-pool with per-region CiliumPodIPPool resources to get topology-aware pod CIDR assignment, combined with Azure UDRs (managed by the cloud-controller-manager route controller) to make the overlay pod IPs routable. This works but loses the main benefit of Azure IPAM — pod IPs as native VNet addresses with zero routing infrastructure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment