Skip to content

Instantly share code, notes, and snippets.

@Drallas
Last active October 25, 2024 05:52
Show Gist options
  • Save Drallas/7e4a6f6f36610eeb0bbb5d011c8ca0be to your computer and use it in GitHub Desktop.
Save Drallas/7e4a6f6f36610eeb0bbb5d011c8ca0be to your computer and use it in GitHub Desktop.

Mount Volumes into Proxmox VMs with Virtio-fs

Part of collection: Hyper-converged Homelab with Proxmox

Virtio-fs is a shared file system that lets virtual machines access a directory tree on the host. Unlike existing approaches, it is designed to offer local file system semantics and performance. The new virtiofsd-rs Rust daemon Proxmox 8 uses, is receiving the most attention for new feature development.

Performance is very good (while testing, almost the same as on the Proxmox host)

VM Migration is not possible yet, but it's being worked on!

Architecture

Screenshot 2023-09-27 at 17 47 52

Why?

Since I have a Proxmox High Available cluster with Ceph, I like to mount the Ceph File System, with CephFS Posix-compliant directories into my VM’s. I have been playing around with LXC container and Bind Mounts and even successfully setup Docker Swarm in LXC Containers. Unfortunately, this is not a recommended configuration and comes with some trade-offs and cumbersome configuration settings.

This Write-Up explains how to Create Erasure Coded CephFS Pools to store Volumes that than can be mounted into a VM via virtiofs.


Install virtiofsd

| This procedure has been tested with Ubuntu Server 22.04 and Debian 12!

Proxmox 8 Nodes, don’t have virtiofsd installed by default, so the first step is to install it.

apt install virtiofsd -y

# Check the version
/usr/lib/kvm/virtiofsd --version
virtiofsd backend 1.7.0

virtiofsd 1.7.0 has many issues (hangs after rebooting the vm, superblock errors etc...) version 1.7.2 and 1.8.0 seems to work much better, it can be found at virtio-fs releases page. But be carefull this package is not considered stable and not even in unstable Debian Package Tracker.


Add the hookscript to the vm

Still on the Proxmox host!

Get the Hookscript files files and copy them to /var/lib/vz/snippets, and make virtiofs_hook.pl executable.

Or use the get-hookscript.sh script to download the scripts files automatically to /var/lib/vz/snippets.

cd ~/
sudo sh -c "wget https://raw.githubusercontent.com/Drallas/Virtio-fs-Hookscript/main/get_hook_script.sh"
sudo chmod +x ~/get-hook%20script.sh
./get-hook%20script.sh

Modify the conf file

To set the VMID and the folders that a VM needs to mount, open the virtiofs_hook.conf file.

sudo nano /var/lib/vz/snippets/virtiofs_hook.conf

Set the Hookscript to an applicable VM

Set the hookscript to a VM.

qm set <vmid> --hookscript local:snippets/virtiofs_hook.pl

That's it, when it's added to the VM, the script does it magic on VM boot:

  • Adding the correct Args section to the virtiofsd args: -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node........
  • Creating the sockets that are needed for the folders.
  • Cleanup on VM Shutdown

Start / Stop VM

The VM can now be started and the hookscript takes care of the virtiofsd part.

qm start <vmid>

Check

Check the processes virtiofsd ps aux | grep virtiofsd or systemctl | grep virtiofsd for the systemd services.

If all is good, it looks like this: Screenshot 2023-09-23 at 12 51 55


Mount inside VM

Linux kernel >5.4 inside the VM, supports Virtio-fs natively

Mounting is in the format: mount -t virtiofs <tag> <local-mount-point>

To find the tag

On the Proxmox host; Exucute qm config <vmid> --current and look for the tag=xxx-docker inside the args section args: -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem -chardev socket,id=char1,path=/run/virtiofsd/xxx-docker.sock -device vhost-user-fs-pci,chardev=char1,tag=<vmid>-<appname>

# Create a directory
sudo mkdir -p /srv/cephfs-mounts/<foldername>

# Mount the folder
sudo mount -t virtiofs mnt_pve_cephfs_multimedia /srv/cephfs-mounts/<foldername>

# Add them to /etc/fstab
sudo nano /etc/fstab

# Mounts for virtiofs
# The nofail option is used to prevent the system to hang if the mount fails!
<vmid>-<appname>  /srv/cephfs-mounts/<foldername>  virtiofs  defaults,nofail  0  0
 
# Mount everything from fstab
sudo systemctl daemon-reload && sudo mount -a

# Verify
ls -lah /srv/cephfs-mounts/<vmid>-<appname>

Issues

  1. New Vm's tend to trow a 'superblock' error on first boot:
mount: /srv/cephfs-mounts/download: wrong fs type, bad option, bad superblock on mnt_pve_cephfs_multimedia, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.

To solve this, I poweroff the vm sudo /sbin/shutdown -HP now and then start it again from the host with qm start <vmid>, everything should mount fine now.

  1. Adding an extra volume throws also a 'superblock' error.
qm stop <vmid>
sudo nano /etc/pve/qemu-server/<vmid>.conf
# Remove the Arg entry
`args: -object memory-backend-memfd,id=mem,size=4096M,share=on..
qm start <vmid>

Now the Volume's all have a superblock error; I poweroff the vm sudo /sbin/shutdown -HP now and then start it again from the host with qm start <vmid>, everything should mount fine again.


Cleanup

To remove Virtio-fs from a VM and from the host:

nano /etc/pve/qemu-server/xxx.conf

# Remove the following lines
hookscript: local:snippets/virtiofs-hook.pl
args: -object memory-backend-memfd,id=mem,size=4096M,share=on..

Disable each virtiofsd-xxx service, replace xxx with correct values or use (* wildcard) to remove them all at once.

systemctl disable virtiofsd-xxx
sudo systemctl reset-failed virtiofsd-xxx

This should be enough, but if the reference persist:

# Remove leftover sockets and services.
rm -rf /etc/systemd/system/virtiofsd-xxx
rm -rf /etc/systemd/system/xxx.scope.requires/
rmdir /sys/fs/cgroup/system.slice/'system-virtiofsd\xxx' 

If needed reboot the Host, to make sure all references are purged from the system state.


Links


Moved to: https://github.com/Drallas/Virtio-fs-Hookscript/blob/main/Script/get-hook%20script.sh
# get-hook script.sh retained to keep it's commit history!
# Moved to: https://github.com/Drallas/Virtio-fs-Hookscript/blob/main/Script/virtiofs-hook.pl
@scottmeup
Copy link

Is it possible to run the hookscript on a guest that is setup with cloud-init?

I tried the method described above but with no luck: the guest does not recognize the tags.

It looks like the guest doesn't receive the correct args values.

proxmox$ sudo qm config 100 --current

...
args: -fw_cfg name=opt/com.coreos/config,file=/etc/pve/geco-pve/coreos/100.ign
...
hookscript: local:snippets/virtiofs_hook.pl
...

@Drallas
Copy link
Author

Drallas commented Mar 15, 2024

Is it possible to run the hookscript on a guest that is setup with cloud-init?

I tried the method described above but with no luck: the guest does not recognize the tags.

It looks like the guest doesn't receive the correct args values.

proxmox$ sudo qm config 100 --current

...
args: -fw_cfg name=opt/com.coreos/config,file=/etc/pve/geco-pve/coreos/100.ign
...
hookscript: local:snippets/virtiofs_hook.pl
...

No idea didn’t try this. Did you install virtiofs and is it working properly?

@scottmeup
Copy link

Yes, I followed your guide - very helpful btw - and have it running and tested working on another guest.

I'm still trying to track down what's happening exactly: I can see an instance of virtiofsd that matches the guest cloud-init machine but the tags don't come up in the output of qm config 100 --current, and when I try to mount the file system it gives:

$ sudo mount -t virtiofs 100-100 /mnt/virtio_independent
mount: /var/mnt/virtio_independent: wrong fs type, bad option, bad superblock on 100-100, missing codepage or helper program, or other error.

 

From the output it looks like something might be happening during pre-start?

In the end the args become set as in the last line of post-start.

 

pre-start

100 is starting, doing preparations.
Creating directory: /run/virtiofsd/
attempting to install unit virtiofsd-100-common...
DIRECTORY DOES EXIST!
attempting to install unit virtiofsd-100-100...
ERROR: /run/virtiofsd/ does not exist!
-object memory-backend-memfd,id=mem,size=2048M,share=on -numa node,memdev=mem -chardev socket,id=char0,path=/run/virtiofsd/100-common.sock -device vhost-user-fs-pci,chardev=char0,tag=100-common -chardev socket,id=char1,path=/run/virtiofsd/100-100.sock -device vhost-user-fs-pci,chardev=char1,tag=100-100
Appending virtiofs arguments to VM args.

 

post-start

100 started successfully.
Removing virtiofs arguments from VM args.
conf->args = -fw_cfg name=opt/com.coreos/config,file=/etc/pve/geco-pve/coreos/100.ign -object memory-backend-memfd,id=mem,size=2048M,share=on -numa node,memdev=mem -chardev socket,id=char0,path=/run/virtiofsd/100-common.sock -device vhost-user-fs-pci,chardev=char0,tag=100-common -chardev socket,id=char1,path=/run/virtiofsd/100-100.sock -device vhost-user-fs-pci,chardev=char1,tag=100-100
vfs_args = -object memory-backend-memfd,id=mem,size=2048M,share=on -numa node,memdev=mem -chardev socket,id=char0,path=/run/virtiofsd/100-common.sock -device vhost-user-fs-pci,chardev=char0,tag=100-common -chardev socket,id=char1,path=/run/virtiofsd/100-100.sock -device vhost-user-fs-pci,chardev=char1,tag=100-100
-fw_cfg name=opt/com.coreos/config,file=/etc/pve/geco-pve/coreos/100.ignconf->args = -fw_cfg name=opt/com.coreos/config,file=/etc/pve/geco-pve/coreos/100.ign

 

virtiofs_hook.conf

100: /mnt/sdb2/common, /mnt/sdb2/100
101: /mnt/sdb2/common, /mnt/sdb2/101
1000: /mnt/sdb2/common, /mnt/sdb2/1000

@Drallas
Copy link
Author

Drallas commented Mar 15, 2024

I always saw ‘superblock’ errors on the first boot; see the issue’s for details. Perhaps it helps!?

@cprhh
Copy link

cprhh commented Mar 31, 2024

Hi @Drallas
Thank you for the work with this guide. I tried to follow but got to a point where i do not understand what kind of problem i have.
Maybe you have a idea for me?

root@pve:/var/lib/vz/snippets# ls -la
total 20
drwxr-xr-x 2 root root 4096 Apr  1 00:01 .
drwxr-xr-x 6 root root 4096 Mar 31 17:04 ..
-rw-r--r-- 1 root root   23 Mar 31 23:34 virtiofs_hook.conf
-rwxr-xr-x 1 root root 4165 Apr  1 00:01 virtiofs_hook.pl

If a run qm like in your example i got "hookscript: script 'local:snippets/virtiofs-hook.pl' does not exist"


root@pve:/var/lib/vz/snippets# qm set 101 --hookscript local:snippets/virtiofs-hook.pl
400 Parameter verification failed.
hookscript: script 'local:snippets/virtiofs-hook.pl' does not exist

qm set <vmid> [OPTIONS]

Found the Problem now. You have a typo in your script. It should be:

qm set <vmid> --hookscript local:snippets/virtiofs_hook.pl
and not
qm set <vmid> --hookscript local:snippets/virtiofs-hook.pl

@Drallas
Copy link
Author

Drallas commented Apr 1, 2024

@cprhh Glad you found the issue yourself, thanks for pointing out the typo in the guide (sorry for that)..

@scyto
Copy link

scyto commented Apr 8, 2024

@Drallas i am little confused how your VMs have all these args? none of mine have any form -f -object args.... what am i missing?

On the Proxmox host; Exucute qm config <vmid> --current and look for the tag=xxx-docker inside the args section args: -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem -chardev socket,id=char1,path=/run/virtiofsd/xxx-docker.sock -device vhost-user-fs-pci,chardev=char1,tag=<vmid>-<appname>

@scyto
Copy link

scyto commented Apr 8, 2024

@Drallas on live migration, i note that all the changes documented here are now merged in all upstream repos.... i wonder how long before it makes its way down to proxmox.... https://gitlab.com/virtio-fs/virtiofsd/-/issues/136

@Drallas
Copy link
Author

Drallas commented Apr 9, 2024

@scyto Those values are set when the hookscript is added to the vm, no idea why it's different on yours system But you should have the tag=<vmid>-<appname> section?

I have not touched my cluster for a while, it's running smooth and I just use the services on top of it. It needs some maintenance soon, then i will also update them and check the live migration.

@scyto
Copy link

scyto commented Apr 10, 2024

oooh, thanks, i haven't implemented the scripts yet so no wonder i am confused / stupid

@00Asgaroth00
Copy link

00Asgaroth00 commented May 24, 2024

I've now managed to get swam running in vm's with virtiofsd mounts in each vm, however i'm still seeing corruption when i drain a swarm node and "fail over" a service to another node, I've tested with portainer and adguard home and both give data corruption issues when performing the failover. For reference i was experiencing the same thing using lxc container and are mentioned in these two comments:

Portainer Error Log
AdGuard Home Error Log

Are you seeing these types of errors using vm's and virtiofs mounts?

Edit: attaching my python hookscript for others if they want to adapt it, this will generate a "template" config file that you can then edit to tweak your settings. I've used configobj python module for ini style config files you you will need to install that "apt install python3-configobj". I'll betweaking this script as I go along, I hope to create a git repo at some point adding in all my automation for my homelab, anyhoo find attached.
Edit 2: wont let me attach the hookscript, only supports images

@Drallas
Copy link
Author

Drallas commented May 24, 2024

I don’t see any of this, my nodes get often rebooted at random, and everything keeps on running without issues. Only had a corrupted FreshRSS db once..

@00Asgaroth00
Copy link

00Asgaroth00 commented May 24, 2024

This is odd then, the one difference that may be an issue for me is that i'm on rocky 9 kernel version 5.14.0-427 and there may be something up with kernel virtiofs on that release, what kernel version does debian 12 run with?

I've now tested with virtiofsd 1.7.2, 1.8.0 and currently testing 1.10.1

do you run either portainer or adguard?

what parameters are you currently passing to virtiofsd? I'm currently passing:

/usr/libexec/virtiofsd-1.10.1 --syslog --announce-submounts --posix-acl --log-level=debug --cache=always --allow-direct-io --inode-file-handles=mandatory --sandbox=chroot --socket-path=<snip> --shared-dir=<snip>

I was testing chroot sandbox instead of default namespace just in case it was something there causing issues with access on the other machines

Edit: A quick google mentioned debian 12 comes with kernel version 6.1, I just updated the rocky 9 vm's to kernel version 6.1.91 (6.1.91-1.el9.elrepo.x86_64), and I still have the same issue :(

@JSinghDev
Copy link

I am trying to run samba share in a debian 12 guest on virtiofsd share. Samba is unable to read acls for share.
I modified the script to include --posix-acl parameter for virtiofsd service.

Do we need to enable posix acls explicitly in the guest as well?

@xhorange
Copy link

I am trying to run samba share in a debian 12 guest on virtiofsd share. Samba is unable to read acls for share. I modified the script to include --posix-acl parameter for virtiofsd service.

Do we need to enable posix acls explicitly in the guest as well?

I also encountered the same problem. Where should this parameter be added to the script? Is it the args parameter? This has troubled me for a long time.

@Drallas
Copy link
Author

Drallas commented Jul 19, 2024

I am trying to run samba share in a debian 12 guest on virtiofsd share. Samba is unable to read acls for share. I modified the script to include --posix-acl parameter for virtiofsd service.
Do we need to enable posix acls explicitly in the guest as well?

I also encountered the same problem. Where should this parameter be added to the script? Is it the args parameter? This has troubled me for a long time.

Not sure, currently occupied with some other things!

But feel free to post it, if you find a working solution, or to do a PR on the https://github.com/Drallas/Virtio-fs-Hookscript/tree/main/Script

@JSinghDev
Copy link

JSinghDev commented Jul 20, 2024

I am trying to run samba share in a debian 12 guest on virtiofsd share. Samba is unable to read acls for share. I modified the script to include --posix-acl parameter for virtiofsd service.
Do we need to enable posix acls explicitly in the guest as well?

I also encountered the same problem. Where should this parameter be added to the script? Is it the args parameter? This has troubled me for a long time.

So this is an excellent post I dug up which lays out how to do enable acls for samba shares:
https://www.reddit.com/r/Proxmox/comments/121r7f0/proxmox_zfs_samba_vm_shadow_copy_clamav_xattr/
Gives the perfect solution:
The idea is to enable xattr=sa, acltype=posixacl, aclinherit=passthrough on the zfs dataset that is to be transferred. This can be done using the zfs set command. I also installed acl on the guest.
Then edit the execstart line for the virtiofs service or hookscript to include: -o xattr -o posix_acl -o modcaps=+sys_admin

So the hookscript line where the service is defined should look like:

ExecStart=/usr/libexec/virtiofsd --log-level debug --socket-path /run/virtiofsd/%i-[% share_id %].sock --shared-dir [% share %] -o xattr -o posix_acl -o modcaps=+sys_admin --cache=always --announce-submounts --inode-file-handlesles=mandatory.

@scottmeup
Copy link

Any idea why the tags wouldn't show up when running qm config <vmid> --current?

This is the qm config --current output

args: -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem
balloon: 2048
boot: order=scsi0;ide2;net0
cores: 1
cpu: x86-64-v2-AES
hookscript: local:snippets/virtiofs_hook.pl
ide2: local:iso/debian-12.5.0-amd64-netinst.iso,media=cdrom,size=629M
memory: 4096
meta: creation-qemu=8.1.5,ctime=1709037614
name: Debian-Public-Lan
net0: virtio=BC:24:11:9E:CB:50,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-103-disk-0,iothread=1,size=50G
scsihw: virtio-scsi-single
smbios1: uuid=9f2e888c-a50c-4631-b3b5-07fcfbd146a5
sockets: 1
startup: order=3
vmgenid: 3f09825f-f97a-43e1-83ad-ddc226bde4b6

This is the proxmox startup output

GUEST HOOK: 103 pre-start
103 is starting, doing preparations.
Creating directory: /run/virtiofsd/
attempting to install unit virtiofsd-103-common...
DIRECTORY DOES EXIST!
attempting to install unit virtiofsd-103-103...
DIRECTORY DOES EXIST!
-object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem -chardev socket,id=char0,path=/run/virtiofsd/103-common.sock -device vhost-user-fs-pci,chardev=char0,tag=103-common -chardev socket,id=char1,path=/run/virtiofsd/103-103.sock -device vhost-user-fs-pci,chardev=char1,tag=103-103
Appending virtiofs arguments to VM args.
GUEST HOOK: 103 post-start
103 started successfully.
Removing virtiofs arguments from VM args.
conf->args = -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem -chardev socket,id=char0,path=/run/virtiofsd/103-common.sock -device vhost-user-fs-pci,chardev=char0,tag=103-common -chardev socket,id=char1,path=/run/virtiofsd/103-103.sock -device vhost-user-fs-pci,chardev=char1,tag=103-103
vfs_args = -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem -chardev socket,id=char0,path=/run/virtiofsd/103-common.sock -device vhost-user-fs-pci,chardev=char0,tag=103-common -chardev socket,id=char1,path=/run/virtiofsd/103-103.sock -device vhost-user-fs-pci,chardev=char1,tag=103-103
-object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=memconf->args = -object memory-backend-memfd,id=mem,size=4096M,share=on -numa node,memdev=mem
TASK OK

@Drallas
Copy link
Author

Drallas commented Aug 14, 2024

@scottmeup See issues, that's all I know so far.

@GamerBene19
Copy link

GamerBene19 commented Sep 21, 2024

Any idea why the tags wouldn't show up when running qm config <vmid> --current?

Tags can't be longer than 36 chars (36 bytes). Perhaps your paths are longer than that? In this case qemu simply discards them iirc.

Edit: Just saw your tag in the output above, so the length is not the issue. Leaving this up as it might be helpful for others

@RyanChewKengYang
Copy link

Hi Drallas, thanks for the script and the very well written writeup! I am using it right now in production but we ran into an issue where virtual machines would get stuck trying to read from the mounted directory. Upon investigating, I was not able to even run ls in the mounted directory either. Only a hard reboot fixed the issue. Do you have any idea what's going on?

@GamerBene19
Copy link

Hi Drallas, thanks for the script and the very well written writeup! I am using it right now in production but we ran into an issue where virtual machines would get stuck trying to read from the mounted directory. Upon investigating, I was not able to even run ls in the mounted directory either. Only a hard reboot fixed the issue. Do you have any idea what's going on?

Did the virtiofs daemon on the host crash? Iirc the vm just hangs if it exits while the VM is still running.

@RyanChewKengYang
Copy link

Did the virtiofs daemon on the host crash? Iirc the vm just hangs if it exits while the VM is still running.

No the daemon did not. We have 4 VMs on the server and all of them have virtiofs shares mapped. Only one VM hangs while the rest were still able to access files on the virtiofs share.

@Drallas
Copy link
Author

Drallas commented Oct 7, 2024

Hi Drallas, thanks for the script and the very well written writeup! I am using it right now in production but we ran into an issue where virtual machines would get stuck trying to read from the mounted directory. Upon investigating, I was not able to even run ls in the mounted directory either. Only a hard reboot fixed the issue. Do you have any idea what's going on?

I have never experienced this behaviour in my Homelab, running an active Docker Swarm (with multiple Docker Volumes) on top of this! I assume all your packages are up to date!?

@RyanChewKengYang
Copy link

I have never experienced this behaviour in my Homelab, running an active Docker Swarm (with multiple Docker Volumes) on top of this! I assume all your packages are up to date!?

Yes all packages up to date with the latest proxmox kernel. We run kubernetes on each VM and each pod maps to the virtiofs mount on the VM and the workloads that we run are very iops intensive. Initially we got good results with virtiofs with 60GB/s reads and 12GB/s writes on dRAID-1 but the hanging seems to have gotten worse. I managed to attempt a bandaid fix by increasing the queue size to the maximum of 1024 which seemed to help for a while but the hanging came back after a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment