Skip to content

Instantly share code, notes, and snippets.

@superseb
Last active November 14, 2024 10:09
Show Gist options
  • Save superseb/076f20146e012f1d4e289f5bd1bd4971 to your computer and use it in GitHub Desktop.
Save superseb/076f20146e012f1d4e289f5bd1bd4971 to your computer and use it in GitHub Desktop.
Restore Rancher 2 cluster/node agents on clusters

Restore Rancher 2 cluster/node agents on clusters

This is an unsupported scenario, see rancher/rancher#14731 when there is an official solution.

When cattle-cluster-agent and/or cattle-node-agent are accidentally deleted, or when server-url/cacerts are changed.

Generate definitions

  • Generate API token in the UI (user -> API & Keys) and save the Bearer token
  • Find the clusterid in the Rancher UI (format is c-xxxxx), its in the address bar when the cluster is selected
  • Generate agent definitions (needs curl, jq)
# Rancher URL
RANCHERURL="https://rancher.mydomain.com"
# Cluster ID
CLUSTERID="c-xxxxx"
# Token
TOKEN="token-xxxxx:xxxxx"
# Valid certificates
curl -s -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .command'
# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'

Apply definitions

The generated command needs to be executed using kubectl configured with a kubeconfig to talk to the cluster. See the gists below to retrieve the kubeconfig:

  1. Generate kubeconfig on node with controlplane role
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml
  1. Apply definitions (replace with the command returned from generating the definitions)
docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'curl --insecure -sfL https://xxx/v3/import/dl75kfmmbp9vj876cfsrlvsb9x9grqhqjd44zvnfd9qbh6r7ks97sr.yaml | kubectl apply -f -'
@superseb
Copy link
Author

superseb commented Jul 5, 2021

This is a different issue, I assume the cattle-system namespace is in Terminating as the error suggests but why? Did you try removing it manually? You probably need to edit/remove the finalizers for it to complete so this action can be run successfully.

@teopost
Copy link

teopost commented Dec 17, 2021

I accidentally deleted the pod cattle-cluster-agent

kubectl delete pod cattle-cluster-agent-56f9c876b9-8xmt2  -n cattle-system

I tried to recreate them with the commands above, which I reproduce below

# Rancher URL
export RANCHERURL="https://rancher.sacchi.lan"

# Set cluster ID
export CLUSTERID="c-rf7lf"

# Set token
export TOKEN="token-6sxh5:l87p5chllwjngczgsvbprstlc79zg8bg8z7p7hj48hjyv5xbddh75pl"

# Self signed certificates
curl -s -k -H "Authorization: Bearer ${TOKEN}" "${RANCHERURL}/v3/clusterregistrationtokens?clusterId=${CLUSTERID}" | jq -r '.data[] | select(.name != "system") | .insecureCommand'
curl --insecure -sfL https://rancher.sacchi.lan/v3/import/s48t7jzdmftdscc5vlw2mscvggtd7r7rb6m6p2qr98srdqtq98njsj.yaml | kubectl apply -f -

# on the controlplane
docker run --rm --net=host -v $(docker inspect kubelet --format '{{ range .Mounts }}{{ if eq .Destination "/etc/kubernetes" }}{{ .Source }}{{ end }}{{ end }}')/ssl:/etc/kubernetes/ssl:ro --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'kubectl --kubeconfig /etc/kubernetes/ssl/kubecfg-kube-node.yaml get configmap -n kube-system full-cluster-state -o json | jq -r .data.\"full-cluster-state\" | jq -r .currentState.certificatesBundle.\"kube-admin\".config | sed -e "/^[[:space:]]*server:/ s_:.*_: \"https://127.0.0.1:6443\"_"' > kubeconfig_admin.yaml

docker run --rm --net=host -v $PWD/kubeconfig_admin.yaml:/root/.kube/config --entrypoint bash $(docker inspect $(docker images -q --filter=label=io.cattle.agent=true) --format='{{index .RepoTags 0}}' | tail -1) -c 'curl --insecure -sfL https://rancher.sacchi.lan/v3/import/s48t7jzdmftdscc5vlw2mscvggtd7r7rb6m6p2qr98srdqtq98njsj.yaml | kubectl apply -f -'

clusterrole.rbac.authorization.k8s.io/proxy-clusterrole-kubeapiserver unchanged
clusterrolebinding.rbac.authorization.k8s.io/proxy-role-binding-kubernetes-master unchanged
namespace/cattle-system unchanged
serviceaccount/cattle unchanged
clusterrolebinding.rbac.authorization.k8s.io/cattle-admin-binding unchanged
secret/cattle-credentials-72ce139 unchanged
clusterrole.rbac.authorization.k8s.io/cattle-admin unchanged
deployment.extensions/cattle-cluster-agent unchanged
daemonset.extensions/cattle-node-agent unchanged

But both cattle-cluster-agent and cattle-node-agent are not created.

[root@webmasksvi] # kubectl --kubeconfig=$PWD/kubeconfig_admin.yaml get pods -n cattle-system
No resources found in cattle-system namespace.

Could you help me?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment