Last active
August 31, 2022 09:53
-
-
Save naioja/eb8bac307a711e704b7923400b10bc14 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Problem description | |
On the latest update of systemd for Ubuntu 18.04 the underlying OS for AKS the following bug surfaced : https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119 | |
The affected version being : 237-3ubuntu10.54 | |
Below are some ALTERNATIVE suggestions how to fix it, besides the official response from Microsoft : https://status.azure.com/en-us/status | |
1. Fixing it with the az vmss run-command: | |
``` | |
AKS_MANAGED_GROUP="MC_rg-monitor001_aks001_eastus2" | |
VMSS_NAME="aks-sysnp001-42513286-vmss" | |
VMSS_INSTANCE_ID="0" | |
az vmss run-command invoke\ | |
-g $AKS_MANAGED_GROUP \ | |
-n $VMSS_NAME\ | |
--command-id RunShellScript \ | |
--instance-id $VMSS_INSTANCE_ID \ | |
--scripts "echo 'FallbackDNS=168.63.129.16' >> /etc/systemd/resolved.conf && /systemctl restart systemd-resolved" | |
``` | |
Then cycle thru the instances IDs in your VMSS, once all are done, move to your next VMSS, equivalent to the next node pool. | |
2. If you have enabled SSH access enabled on your AKS cluster and you have a large number nodes, do the fix with ansible could be a valid alternative too: | |
--- | |
- name: Ansible validate if the packages are installed | |
hosts: AKS | |
become: true | |
become_method: sudo | |
become_user: azureuser | |
tasks: | |
- name: "Register systemd package version" | |
command: dpkg-query --showformat='${Version}' --show systemd | |
register: systemd_package_version | |
- name: Check whether /etc/systemd/resolved.conf contains "FallbackDNS=168.63.129.16" | |
command: grep -Fxq "FallbackDNS=168.63.129.16" /etc/systemd/resolved.conf | |
register: checkmyconf | |
check_mode: no | |
ignore_errors: yes | |
changed_when: no | |
- name: add fix to /etc/systemd/resolved.conf | |
lineinfile: | |
dest: /etc/systemd/resolved.conf | |
line: "FallbackDNS=168.63.129.16" | |
when: (checkmyconf.rc == 1) and (systemd_package_version.stdout == "237-3ubuntu10.54") | |
#ansible_hosts | |
[AKS] | |
10.y.y.y | |
10.x.x.x | |
ansible-playbook fix.yml -i ansible_hosts | |
3. If you have just a few nodes in your cluster you can just run a privileged container on each node and the fix manually: | |
``` | |
kubectl debug node/aks-NODENAME-HERE!!!!! -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0 | |
```` | |
Once logged it just mount the host OS by using chroot | |
``` | |
chroot /host | |
``` | |
You can confirm that you are affected by the bug if the following command ```dpkg-query --showformat='${Version}' --show systemd``` | |
that returns `237-3ubuntu10.54` | |
Then simply ```echo 'FallbackDNS=168.63.129.16' >> /etc/systemd/resolved.conf``` and execute a restart ```systemctl restart systemd-resolved``` | |
4. Create a daemonset and add the fallback dns entry in the config file and restart the service: | |
# systemd-fix-daemonset.yml | |
--- | |
apiVersion: apps/v1 | |
kind: DaemonSet | |
metadata: | |
name: systemd-fix-daemonset | |
namespace: kube-system | |
spec: | |
selector: | |
matchLabels: | |
job: systemd-fix-daemonset | |
template: | |
metadata: | |
labels: | |
job: systemd-fix-daemonset | |
spec: | |
tolerations: | |
- key: CriticalAddonsOnly | |
operator: Exists | |
effect: NoSchedule | |
- key: WORKLOAD | |
operator: Exists | |
effect: NoSchedule | |
volumes: | |
- name: hostfs | |
hostPath: | |
path: / | |
hostPID: true | |
restartPolicy: Always | |
nodeSelector: | |
"kubernetes.io/os": linux | |
initContainers: | |
- name: init | |
image: alpine | |
command: | |
- /bin/sh | |
- -xc | |
- | | |
chroot /host \ | |
/bin/grep -v ^# /host/etc/systemd/resolved.conf | /bin/grep -qxF 'FallbackDNS=168.63.129.16' /host/etc/systemd/resolved.conf || echo 'FallbackDNS=168.63.129.16' >> /host/etc/systemd/resolved.conf && chroot /host /bin/systemctl restart systemd-resolved | |
volumeMounts: | |
- name: hostfs | |
mountPath: /host | |
containers: | |
- name: sleep | |
image: kubernetes/pause |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment