Load network block device module:
# modprobe nbd max_part=8
Poweroff machine:
# virsh destroy virtual-machine
Connect disk image:
# qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/virtual-machine.qcow2
Check disk:
# fsck /dev/nbd0p1
fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
/dev/nbd0p1: recovering journal
/dev/nbd0p1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
Inode 274 was part of the orphaned inode list. FIXED.
Inode 132276 was part of the orphaned inode list. FIXED.
Deleted inode 142248 has zero dtime. Fix<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -603674 -623174 +(689342--689343)
Fix<y>? yes
Free blocks count wrong for group #18 (15076, counted=15077).
Fix<y>? yes
Free blocks count wrong for group #19 (11674, counted=11675).
Fix<y>? yes
Free blocks count wrong (632938, counted=670871).
Fix<y>? yes
Inode bitmap differences: -274 -132276 -142248
Fix<y>? yes
Free inodes count wrong for group #0 (52, counted=53).
Fix<y>? yes
Free inodes count wrong for group #16 (99, counted=100).
Fix<y>? yes
Free inodes count wrong for group #17 (519, counted=520).
Fix<y>? yes
Free inodes count wrong (204392, counted=204599).
Fix<y>? yes
/dev/nbd0p1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/nbd0p1: 101833/306432 files (0.2% non-contiguous), 553321/1224192 blocks
Disconnect device:
# qemu-nbd --disconnect /dev/nbd0
/dev/nbd0 disconnected
Start machine:
# virsh start virtual-machine
If the above steps still don't work, and you need to save your files from the VM, you might be able to do so by mounting the guest partitions to the host. Below is an example where partition 5 is mounted to
/mnt
. It's assumed that you have already loaded the Network Block Device (nbd) into the kernel like this:$ sudo modprobe nbd max_part=8
Backup files from VM guest to host
Make sure the VM is shut off:
Connect the file system (as above):
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/bad-virtual-machine.qcow2
Check that the
/mnt
directory is free to use as mount point:$ mount | grep /mnt
No output should be seen from above command. If it does, either unmount
/mnt
or use another empty directory.You hopefully see a similar result as above, which means you can access all files on this VM partition and copy them to a safe place.
Make space
Take the opportunity to check free disk space on all devices. Use the
df
command:$ df /mnt
If the
Use%
column shows 90% or more, remove unnecessary files. Also empty any trashes.When done, don't forget to unmount the partition as follows:
$ sudo umount /dev/nbd0p5
And disconnect:
$ sudo qemu-nbd --disconnect /dev/nbd0
Repair broken VM
It might be possible to repair your broken system. It comes with a warning:
If you're not careful, and mix up the devices on the host with the devices on the guest, you may destroy the host!
First a quick review of what we're up to:
We assume that the boot device is corrupt, so we want to replace it with a fresh one from another virtual machine.
With the nbd device disconnected, run
lsblk
on the host. It should show something like this (I always skip the loop devices)Now write down the name of these devices, to make sure you never involve them in any further operations since they belong to your host, which you don't want to alter!
Connect the guest device again:
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/bad-virtual-machine.qcow2
And run
lsblk
again. You should see some new devices like this:We can already now assume that
nbd0p1
is the troublemaker, since it has a size of 512M.You can run
sudo parted -l
. Check the part underDisk /dev/nbd0
. It should look something like this:I suggest you first make a binary backup of
nbd0p1
like this:In this example I will copy it to my
~/tmp
directory under a new directory I callboots
:Disconnect the faulty VM
Connect a fresh working VM of the same type and version as the damaged one. If you don't have one, create it and test it, and shut it down. In this example I assume you have one at path
/var/lib/libvirt/images/ubuntu20.04.qcow2
Connect as follows:
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/ubuntu20.04.qcow2
Make a binary copy of its boot device:
Disconnect the fresh VM:
$ sudo qemu-nbd --disconnect /dev/nbd0
At this point, we have a backup of the bad boot devices in
~/tmp/boots/bad_guest_nbd0p1
, and a working one as~/tmp/boots/ok_guest_nbd0p1
.Connect the broken VM:
$ sudo qemu-nbd --connect=/dev/nbd0 /var/lib/libvirt/images/bad-virtual-machine.qcow2
But don't mount it!
Overwrite the bad partition with the backup from the fresh one. But first make sure you're not involving any partitions of your host machine!
And disconnect:
Try to boot the faulty VM. It may seem to hang, but wait at least 90 seconds. If it still hangs, try to send the F1 key or force reset.
Once you have successfully booted it, open a terminal (in the guest), and run
sudo mount -a
.If you see an error like this
mount: /boot/efi: can't find UUID=AF74-0D3D
then first run
sudo blkid | grep vfat
/dev/vda1: UUID="7289-BC8F" TYPE="vfat" PARTUUID="b443b22b-01"
and then
grep efi /etc/fstab
UUID=AF74-0D3D /boot/efi vfat umask=0077 0 1
If the UUID values differ, edit
/etc/fstab
and change the UUID of/boot/efi
to the value shown byblkid
.It should be workning ok now. Reboot your VM.