This is an unsupported scenario, see rancher/rancher#14731 when there is an official solution.
When cattle-cluster-agent and/or cattle-node-agent are accidentally deleted, or when server-url
/cacerts
are changed.
- Generate API token in the UI (user -> API & Keys) and save the Bearer token
- Find the clusterid in the Rancher UI (format is
c-xxxxx
), its in the address bar when the cluster is selected - Generate agent definitions (needs
curl
,jq
)
# Rancher URL
RANCHERURL="https://rancher.mydomain.com"
# Cluster ID
CLUSTERID="c-xxxxx"
# Token
TOKEN="token-xxxxx:xxxxx"
# Valid certificates
curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .command'
# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'
The generated command needs to be executed using kubectl configured with a kubeconfig to talk to the cluster. See the gists below to retrieve the kubeconfig:
- Get kubeconfig for custom cluster in Rancher 2.x: https://gist.github.com/superseb/f6cd637a7ad556124132ca39961789a4
- Retrieve kubeconfig from RKE v0.1.x or Rancher v2.0.x/v2.1.x custom cluster controlplane node: https://gist.github.com/superseb/3d8de6092ebc4b1581185197583f472a
- Retrieve kubeconfig from RKE v0.2.x or Rancher v2.2.x custom cluster controlplane node: https://gist.github.com/superseb/b14ed3b5535f621ad3d2aa6a4cd6443b
- Generate kubeconfig on node with controlplane role
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
- Apply definitions (replace with the command returned from generating the definitions)
docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'curl --insecure -sfL https://xxx/v3/import/dl75kfmmbp9vj876cfsrlvsb9x9grqhqjd44zvnfd9qbh6r7ks97sr.yaml | kubectl apply -f -'
This is a different issue, I assume the
cattle-system
namespace is in Terminating as the error suggests but why? Did you try removing it manually? You probably need to edit/remove the finalizers for it to complete so this action can be run successfully.