How I rescue VM

How I rescue VM

The problem

There are some points in your life as an Openstack administrator, your (or your customer's) virtual machine (VM) can not boot properly. And if you want to bring it back to the game, how do you do it? This note is how I do it.

Situation:

VM console stucks at "Probing EDD (edd=off to disable)"
VM boot log has the last line Failed to load SELinux Policy. Freezing..
OS: CentOS 8

Disclaim

This note is based on my skill and knowledge. Because of that, there are steps or directions to solve problems which are not good or optimal. Feel free to comment and show your opinions to make me and this note better.

How I nail it

Rescue by OpenStack

Looking at the console of VM, "Probing EDD (edd=off to disable)" was slapping my face, after several restarts (4fun: SRE meaning), it was still there. Don't know what it is, then I gg it (here). Know that I have to rescue VM.

As you may know, OpenStack has an option called "Rescue Instance". Happily clicking on the button, an error log appeared "...not support volume backed server...". Got fuck as usual, honestly, I'm not good at Nova, not know enough to know what it is, what to do, how to fix this. The time to bring this function back can not be estimated, I quickly choose another way, bringing VM the hard way - "Attach a bootable volume to it, and edit root volume through this".

Add a volume to the VM

TLDR: In this move, I created a volume with Ubuntu image, set as bootable; attached to VM; reboot VM -> but not work. The problem is in the virsh file in OpenStack, the Ubuntu volume is always not the first choice to use.

The format of boot in the virsh file is:

  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>

So, as always, the dev named "hd" is booted by virsh. But, all disks attached by OpenStack don't have anything to point out which is which, to chance to where to boot:

    <!-- <disk1> -->
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <!-- <disk2> -->
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      ...
      <target dev='vdb' bus='virtio'/>
      ...
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

OpenStack doesn't work, go deeper. Try editing the virsh file.

Sneaking in `virsh edit`

TLDR: change boot sequence like a guide in Stack Exchange; doesn't work

From Stack Exchange:

*If libvirt doesn't reload VM settings on start/stop, virsh edit command may help. And please write the entire XML file and libvirt version*

*Hmm... everything seems OK. Try adding*

```xml
<boot dev='hd'/>
<boot dev='cdrom'/>
<bootmenu enable='yes'/>
```

to <os> section and look if cdrom appears in the boot menu. Also, try removing all <boot> records from <os> and adding

```xml
<boot order='1'/>
```

to <disk> section

The first option from the above answer was not work, cuz all disk is the same type (hd). Then try update with virsh edit as below:

    <!-- <disk1> -->
    <disk type='network' device='disk'>
      <boot order='2'/>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      ...
      <target dev='vda' bus='virtio'/>
      ...
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <!-- <disk2> -->
    <disk type='network' device='disk'>
      <boot order='2'/>
      <driver name='qemu' type='raw' cache='writeback' discard='unmap'/>
      ...
      <target dev='vdb' bus='virtio'/>
      ...
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

The result was still the same, not work, and VM still stuck at "Probing ..."

NOTE:

When you restart VM with OpenStack, there always a new version of virsh is push to the compute node, so the current change to virsh is not consistent
If your VM has not booted yet, virsh shudown <vm> does not work, cuz it sends shutdown signal to VM. You should use virsh destroy --graceful <vm>. Then start VM as always virsh start <vm>.
The practice I use, when try editing virsh, is: start VM with OpenStack > virsh edit > virsh graceful destroy > virsh start > repeat

Boot from an ISO.

TLDR: push an iso to compute; copy it to nova_libvirt container; virsh edit to boot from this cdrom; change boot option; voila

Copy file.iso to nova_libvirt

Somehow You have ur fav distro in the compute node. My Ops installation is using kolla-ansible, so I have to place this ISO somewhere in libvirt container.

docker inspect nova_libvirt | grep merge 
# "MergedDir": "/var/lib/docker/overlay2/4738dcd492fc9990a383b3218f2027c9b833e585793cbdcd8853faefd31cf79a/merged",
cd the_folder_above
cp path/to/file.iso .

virsh edit

  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <!-- add this -->
    <boot dev='cdrom'/>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
...
    <!-- and this cdrom -->
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <!-- remember to put cdrom file in right path -->
      <source file='/file.iso'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

with the above config, after destroying and starting, you can boot to your iso file. Now mount your disk and change boot options.

change boot options

It's may easy to boot to iso and mount the desire disk to rootfs, but when you change boot options and run grub2-mkconfig, there will be an error pop out, to fix that you must do the following:

first, mount your desired disk to your/mount-point
second, mount -o bind /dev your/mount-point/dev
third, mount -o bind /proc your/mount-point/proc
forth, mount -o bind /sys your/mount-point/sys

Now, edit /etc/default/grub then grub2-mkconfig -o /boot/grub2/grub.cfg. In this case, I add edd=off and selinux=0 to boot options.

After all, remove the config in virsh to the original and restart the VM. Now, your VM can breathe again.

This is how I update boot to my VM. Thanks for reading! Happy rebooting!

hieunt79/how-2-rescue-vm-in-ops.md