This gets the UEK4 kernel series.
Smaller VMs won't have enough RAM for crashkernel=auto
to allocate anything for the crash kernel.
That will update the VM to OEL 7.8.
# yum update -y
... lots of updates ...
(138/201): kernel-uek-4.1.12-124.38.1.el7uek.x86_64.rpm | 45 MB 00:03
...
Make the /etc/default/grub
file look like:
GRUB_TIMEOUT=30
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_CMDLINE_LINUX_DEFAULT="crashkernel=auto console=tty0 console=ttyS0,115200n8"
GRUB_TERMINAL="console serial"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
GRUB_DISABLE_RECOVERY=true
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 net.ifnames=0"
GRUB_DISABLE_RECOVERY="true"
# grub2-mkconfig -o /boot/grub2/grub.cfg
# yum install -y kexec-tools
# systemctl enable kdump
Restart via Azure Portal "restart" option on VM overview.
# systemctl status -l kdump
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
Active: active (exited) since Fri 2020-05-01 23:28:49 UTC; 1min 11s ago
Process: 735 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Main PID: 735 (code=exited, status=0/SUCCESS)
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: drwxr-xr-x 1 root root 0 May 1 23:28 usr/share/zoneinfo
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: -rw-r--r-- 1 root root 118 Apr 30 14:55 usr/share/zoneinfo/UTC
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: drwxr-xr-x 1 root root 0 May 1 23:28 var
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: lrwxrwxrwx 1 root root 11 May 1 23:28 var/lock -> ../run/lock
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: lrwxrwxrwx 1 root root 6 May 1 23:28 var/run -> ../run
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: ========================================================================
May 01 23:28:45 oel-74-uek4-kdump-test dracut[1047]: *** Creating initramfs image file '/boot/initramfs-4.1.12-124.38.1.el7uek.x86_64kdump.img' done ***
May 01 23:28:49 oel-74-uek4-kdump-test kdumpctl[735]: kexec: loaded kdump kernel
May 01 23:28:49 oel-74-uek4-kdump-test kdumpctl[735]: Starting kdump: [OK]
May 01 23:28:49 oel-74-uek4-kdump-test systemd[1]: Started Crash recovery kernel arming.
It also helps to be watching the console at the time.
# uname -r > /dev/console
# systemctl status -l kdump > /dev/console
[ 381.383344] Uhhuh. NMI received for unknown reason 21 on CPU 0.
[ 381.383344] Do you have a strange power saving mode enabled?
[ 381.383344] Dazed and confused, but trying to continue
Makes sense-- I never configured Linux to panic on unknown NMI.
Change /etc/sysctl.conf
:
kernel.unknown_nmi_panic=1
kernel.panic_on_unrecovered_nmi=1
kernel.sysrq=1
# sysctl -p
Then reboot from the Azure portal again.
It also helps to be watching the console at the time.
# uname -r > /dev/console
# systemctl status -l kdump > /dev/console
[ 189.646369] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 189.646369] IP: [< (null)>] (null)
[ 189.646369] PGD 0
[ 189.646369] Oops: 0010 [#1] SMP
[ 189.646369] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 xt_owner xt_conntrack nf_conntrack iptable_security ext4 jbd2 mbcache2 xfs crct10dif_pclmul crc32_pclmul libcrc32c ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd sg hv_balloon pcspkr i2c_piix4 acpi_cpufreq i2c_core ip_tables btrfs xor raid6_pq sd_mod hv_netvsc ata_generic pata_acpi hyperv_keyboard hv_utils hv_storvsc hid_hyperv hyperv_fb crc32c_intel ata_piix serio_raw libata hv_vmbus floppy
[ 189.646369] CPU: 0 PID: 7020 Comm: sshd Tainted: G W 4.1.12-124.38.1.el7uek.x86_64 #2
[ 189.646369] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017
[ 189.646369] task: ffff880131542a00 ti: ffff88002140c000 task.ti: ffff88002140c000
[ 189.646369] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
[ 189.646369] RSP: 0018:ffff88013b603d70 EFLAGS: 00010046
[ 189.646369] RAX: 0000000000000000 RBX: ffff88013b618140 RCX: 0000002c27cdf1be
[ 189.646369] RDX: 0000000000000005 RSI: ffffffff81b4f480 RDI: ffff88013b618140
[ 189.646369] RBP: ffff88013b603d98 R08: 0000000000000000 R09: 0000000000000101
[ 189.646369] R10: 00000000006f8fe9 R11: 0000000000001f20 R12: ffffffff81b4f480
[ 189.646369] R13: 0000000000000005 R14: 0000000000000046 R15: 0000000000000000
[ 189.646369] FS: 00007f552c1e98c0(0000) GS:ffff88013b600000(0000) knlGS:0000000000000000
[ 189.646369] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 189.646369] CR2: 0000000000000000 CR3: 0000000021402000 CR4: 0000000000360670
[ 189.646369] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 189.646369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 189.646369] Stack:
[ 189.646369] ffffffff810b4c64 ffff88013b603d98 ffffffff81b4f480 ffff88013b618140
[ 189.646369] ffffffff81b4ff34 ffff88013b603da8 ffffffff810ba0b3 ffff88013b603dc8
[ 189.646369] ffffffff810ba3f3 ffffffff81b4f480 ffff88013b618140 ffff88013b603e28
[ 189.646369] Call Trace:
[ 189.646369] <IRQ>
[ 189.646369] [<ffffffff810b4c64>] ? enqueue_task+0x54/0x90
[ 189.646369] [<ffffffff810ba0b3>] activate_task+0x23/0x30
[ 189.646369] [<ffffffff810ba3f3>] ttwu_do_activate.constprop.90+0x33/0x70
[ 189.646369] [<ffffffff810bd227>] try_to_wake_up+0x1c7/0x390
[ 189.646369] [<ffffffff810bd472>] default_wake_function+0x12/0x20
[ 189.646369] [<ffffffff810d3deb>] __wake_up_common+0x5b/0x90
[ 189.646369] [<ffffffff810d3e33>] __wake_up_locked+0x13/0x20
[ 189.646369] [<ffffffff810d46db>] complete+0x3b/0x60
[ 189.646369] [<ffffffffc0021225>] vmbus_unload_response+0x15/0x20 [hv_vmbus]
[ 189.646369] [<ffffffffc001e07f>] vmbus_on_msg_dpc+0x17f/0x210 [hv_vmbus]
[ 189.646369] [<ffffffff81091020>] tasklet_action+0x130/0x140
[ 189.646369] [<ffffffff81091320>] __do_softirq+0x100/0x320
[ 189.646369] [<ffffffff8175f3bc>] do_softirq_own_stack+0x1c/0x30
[ 189.646369] <EOI>
[ 189.646369] [<ffffffff810915e5>] do_softirq+0x55/0x60
[ 189.646369] [<ffffffff8109167b>] __local_bh_enable_ip+0x8b/0xa0
[ 189.646369] [<ffffffff816153d7>] lock_sock_nested+0x47/0x60
[ 189.646369] [<ffffffff81683455>] tcp_sendmsg+0x35/0xb70
[ 189.646369] [<ffffffff812d3320>] ? sock_has_perm+0x70/0x90
[ 189.646369] [<ffffffff816b01aa>] inet_sendmsg+0x6a/0xb0
[ 189.646369] [<ffffffff812d3453>] ? selinux_socket_sendmsg+0x23/0x30
[ 189.646369] [<ffffffff81612323>] sock_sendmsg+0x43/0x50
[ 189.646369] [<ffffffff816123b5>] sock_write_iter+0x85/0xf0
[ 189.646369] [<ffffffff8121fcac>] __vfs_write+0xdc/0x130
[ 189.646369] [<ffffffff81220379>] vfs_write+0xa9/0x1b0
[ 189.646369] [<ffffffff81102c82>] ? ktime_get_with_offset+0x52/0xb0
[ 189.646369] [<ffffffff81221265>] SyS_write+0x55/0xd0
[ 189.646369] [<ffffffff810ff801>] ? SyS_clock_gettime+0x91/0xd0
[ 189.646369] [<ffffffff8175a7b6>] system_call_fastpath+0x18/0xee
[ 189.646369] Code: Bad RIP value.
[ 189.646369] RIP [< (null)>] (null)
[ 189.646369] RSP <ffff88013b603d70>
[ 189.646369] CR2: 0000000000000000
[ 189.646369] ---[ end trace 5aa01d0606c737ee ]---
[ 189.646369] Kernel panic - not syncing: Fatal exception in interrupt
[ 189.646369] Kernel Offset: disabled
[ 189.646369] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
... crash kernel never starts ...
# sosreport
...