Skip to content

Instantly share code, notes, and snippets.

@mtougeron
Last active November 5, 2024 18:21
Show Gist options
  • Save mtougeron/1e5d7a30df396cd4728a26b2555e0ef0 to your computer and use it in GitHub Desktop.
Save mtougeron/1e5d7a30df396cd4728a26b2555e0ef0 to your computer and use it in GitHub Desktop.
Notes for importing managed clusters into Cluster API

WARNING: This hasn't been tested extensively outside of my environment. Your mileage may vary.

Assumptions:

  • Any security group modifications or creation that CAPA does that's not specifically flagged below are acceptable for a brief disruption when modified
  • This is valid as of CAPA 2.0.2. This may not work with new versions (e.g., the steps were different pre-2.x and it was easier to import even the VPC itself pre-2.x)

Importing CAPA Cluster (using BYO VPC):

  • Make sure AWSManagedControlPlane.spec.eksClusterName matches the EKS cluster name
  • Optionally set AWSManagedControlPlane.spec.network.securityGroupOverrides.controlplane to match the security group you have on the EKS controlplane. If you have extra security groups I haven't been able to figure out how to import those into CAPA but they stay attached to the EKS cluster and are just ignored by CAPA
  • Set the VPC information according to the BYO VPC specs https://cluster-api-aws.sigs.k8s.io/topics/bring-your-own-aws-infrastructure.html#configuring-the-awscluster-specification
  • Determine if you need to set AWSManagedControlPlane.spec.vpcCni.disabled based on what you have installed on your cluster
  • AWS resources have the required tags according to https://cluster-api-aws.sigs.k8s.io/topics/bring-your-own-aws-infrastructure.html#tagging-aws-resources
    • Set tag kubernetes.io/cluster/<clusterName> = owned or shared is set appropriately on VPC, Subnets, & Route table resources
    • Set tag kubernetes.io/cluster/<clusterName> = owned is set on EKS cluster
    • Set tag kubernetes.io/role/internal-elb & kubernetes.io/role/elb are set on the appropriate Subnets
    • Set tag sigs.k8s.io/cluster-api-provider-aws/cluster/<clusterName> = owned on the EKS cluster
    • Set tag sigs.k8s.io/cluster-api-provider-aws/role = common on the EKS cluster
  • Make sure that the credentials/IAM Role that CAPA runs as will have access to the EKS cluster to manage things like CNI and/or iamAuthenticatorConfig (via the aws-auth ConfigMap)
  • If you have an OIDC provider attached you'll need to have it detached before applying the yaml manifest or set AWSManagedControlPlane.spec.associateOIDCProvider: false (haven't been able to figure out why it doesn't detect it's already attached)

Caution: If you are running kube-proxy via your legacy code/install, and set AWSManagedControlPlane.spec.kubeProxy.disabled to true, it will uninstall the kube-proxy DaemonSet

At this point you are running/managing the EKS cluster via CAPA but the compute nodes are still running/connected using the non-CAPA system.

Migrating the workloads to CAPA managed compute tiers:

  • Create new compute tiers using MachineDeployment or AWSManagedMachinePool and size them appropriately
  • Cordon the old compute tiers
  • If using AutoScalingGroups, add the tag k8s.io/cluster-autoscaler/node-template/taint/managed-by = legacy:NoSchedule to the ASGs (or whatever taint you want to use to tell the cluster-autoscaler that the old nodes will have a taint)
  • Taint the old compute tiers with the above taint (this will ensure the cluster-autoscaler knows that any nodes from these ASGs will have the taint when started so it won't try to scale them up)
  • Drain the old compute tier nodes
  • You may be able to rely on the cluster-autoscaler to automatically delete/remove the old nodes but if not, remove them and terminate the instances

Now all compute nodes are managed via CAPA

Importing CAPZ cluster (using BYO VNET):

  • Make sure AzureManagedControlPlane.metadata.name matches the AKS cluster name
  • Set the AzureManagedControlPlane.spec.virtualNetwork fields to match your existing VNET
  • Make sure the AzureManagedControlPlane.spec.sshPublicKey matches what was set on the AKS cluster. (including any potential newlines included in the base64 encoding; this was a big gotcha for me)
    • NOTE: This is a required field in CAPZ, if you don't know what public key was used, you can change or set it via the azure cli however before attempting to import the cluster.
  • Make sure the Cluster.spec.clusterNetwork settings match properly to what you are using in AKS
  • Make sure the AzureManagedControlPlane.spec.dnsServiceIP matches what is set in AKS
  • Set the tag sigs.k8s.io_cluster-api-provider-azure_cluster_<clusterName> = owned on the AKS cluster
  • Set the tag sigs.k8s.io_cluster-api-provider-azure_role = common on the AKS cluster

NOTE: Several fields, like networkPlugin, when not set on the AKS cluster at creation time, will mean that CAPZ will not be able to set it when doing a reconcile loop because AKS doesn't allow it to be changed if not set at creation. If it was set at creation time, CAPZ will be able to successfully change/manage the field

At this point you can apply your yaml manifest and the AKS cluster will be imported as a AzureManagedControlPlane. The managed machine pools are still partially managed by your old system and partially managed by the global AKS settings that are now managed by AKS/CAPZ. I highly recommend setting up new AzureManagedMachinePool(s) as soon as possible, taint & drain the old compute pools, and then remove them.