kristianlm/.gitignore

Making a QEMU VLAN using multicast

I wanted to play Age of Empires II with my sister, and wanted to use QEMU to get this working. Using a VM wouldn't clobber my sister's laptop. And perhaps unsurprisingly, this excellent old game runs very smooth in a guest.

I wanted my setup to look like this:

              ,--- internet ---
          wifi                 `- LAN
         /                         |--- desktop 1 -- QEMU VM2
     laptop -- QEMU VM1            `--- desktop 2 -- QEMU VM3

I wanted all VMs to be connected to the same LAN, but not the internet. This took me a while to get working, so I though I'd do a quick writeup here. This is basically a rewrite of mcastelino's gist, along with my own thoughts.

QEMU has many networking options. TAP devices with a Linux bridge is probably the most common setup. But it requires a lot of iptables, sudo-ing and networing-foo to get working. It would also require messing with the laptop's host network config which I didn't want to do. So here's a way to do this using only user-space tools and UDP.

Pros:

Relatively unintrusive setup and simple to set up
No useruser needed (unless bridging to TAP device)

Cons:

A bit slow (see performance below)
Requires multiple running socat processes
Traffic unencrypted

QEMU's socket and dgram options simply wrap the Ethernet frames in UDP packets. I don't think the documentation mentions that. Perhaps it's obvious to everyone but me. This allows for some nice tricks like bridging a TAP interface on the host with a multicast group as mcastelino describes in his gist, and bridging UDP unicast connections from the outside world.

QEMU'S socket,mcast=<multicast-address>:<port> obtion is a simple way of making a VLAN for your guests that works across hosts (provided they are multicast-reachable, usually on the same LAN). It's probably not an accident that UDP Multicast was chosen to transport Ethernet frames: It is also stateless and connectionless. It also requires no extra infrastructure besides a working multicast setup, it's host OS-independent and requires fewer privileges and less configuration on the host.

As far as I can tell, the mcast option is the only option that will let you connect multiple (more than 2) VMs into the same VLAN without using a host bridge or a similar mechanism. At least the documentation explicitly mentions this option has that ability.

Since my networking skills are limited, I didn't really know how to debug when I encountered problems. So I will try to describe what I did to get this working here.

Ethernet frames inside UDP: a quick demo

First, launch a VM with the mcast setup. QEMU's manpage suggests this:

qemu-system-x86_64 linux.img \
                 -device e1000,netdev=n1,mac=52:54:00:12:34:56 \
                 -netdev socket,id=n1,mcast=230.0.0.1:33000

Here's what that means:

Bind a UDP socket to listen for multicast traffic on 230.0.0.1 on the specified port. All ethernet frames coming from the guest will be wrapped in a UDP packet and sent off to this multicast address. All UDP packets coming from this socket will be sent directly to the guest VM's NIC. The host will be subscribed to this multicast group.

What's nice (and unique?) about multicast addresses is that they can have multiple processes bound to the same address and port - and all bound processes receive a copy of all packets. In my experience, this wasn't always reliable with non-multicast addresses when you had multiple processes listening on the same port (reuseaddr). That part still confuses me. Maybe it's intended for load-balancing?

But multicast is a better option anyhow, because it allows VMs to join from other hosts and VLAN traffic will only reach hosts which subscribe to this multicast address. So if only one host is doing this, the traffic will stop at my first switch. At least that's how I think multicast works.

Now with your first guest up and running, we can make a silly test. Even with nothing properly configured on the guest we can see how the underlying mechanism works. Start tcpdump on your guest, making sure it tcpdump listens on the correct NIC. In my case, that's ens3:

root@guest # tcpdump
listening on ens3, link-type EN10MB (Ethernet), snapshot length 262144 bytes

Now, from your host, run this silly command:

user@host $ echo 1111111111111111111111111111111 | socat -x - \
   UDP4-DATAGRAM:230.0.0.1:33000,reuseaddr,ip-add-membership=230.0.0.1:127.0.0.1

tcpdump should output something like this:

13:21:56.020303 31:31:31:31:31:31 (oui Unknown) > 31:31:31:31:31:31 (oui Unknown), ethertype Unknown (0x3131), length 32:
	0x0000:  3131 3131 3131 3131 3131 3131 3131 3131  1111111111111111
	0x0010:  310a                                     1.

See? Isn't that fun? Here we're seeing what I was hoping to demonstrate: the raw Ethernet frames inside UDP packets. We just made a bogus Ethernet frame and sent it to this multicast groups. All QEMU guests on this mcast group will see this Ethernet frame. Once I was able to test like this, I found it easier to debug.

TAP <--> mcast

Since TAP devices also expect Ethernet frames¹, it is straight-forward to make a TAP device which bridges QEMU multicast VLANs. That's why mcastelino's socat command works.

Note that if you want to go "bidirectional" with socat, its socket must be bound to the right port. Your socat socket should have the same properties as QEMU's socket. You can check using ss -lunp, for example. I eventually got that working with the right bind option. I don't know why it worked without that for mcastelino. sourceport has no effect for me.

user@host $ socat -x - UDP4-DATAGRAM:230.0.0.1:33000,reuseaddr,ip-add-membership=230.0.0.1:127.0.0.1,bind=230.0.0.1:33000 &
user@host $ sudo ss -lunp | grep 33000
UNCONN 0      0          230.0.0.1:33000      0.0.0.0:*    users:(("socat",pid=9746,fd=5))
UNCONN 0      0          230.0.0.1:33000      0.0.0.0:*    users:(("qemu-system-x86",pid=6652,fd=11))

Now with these working socat options, I was able to bridge a TAP device with my multicast VLAN. Here's a considerably more useful example than hand-written Ethernet frames using echo:

user@host $ sudo socat \
  TUN:10.0.3.1/24,tun-type=tap,iff-no-pi,iff-up,iff-debug,tun-name=vmvlan0 \
  UDP4-DATAGRAM:230.0.0.1:33000,reuseaddr,ip-add-membership=230.0.0.1:10.0.3.1,bind=230.0.0.1:33000

This is basically mcastelino's socat command but with the bind option. A vmvlan0 interface pops up on the host and can be used to run dnsmasq or tcpdump directly on the host.

DHCP

You can run a DHCP server on one of the guests or you can run one from the host using the socat bridge above. The dnsmasq command from mcastelino's gist, worked out of the box for me. Here, rewritten with CLI arguments instead of a config file:

sudo -E dnsmasq -C /dev/null -d --bind-dynamic --interface=vmvlan0 --dhcp-range=10.0.3.100,10.0.3.200 --leasefile-ro

VMs getting the same IPv4 address?

See this SO thread. systemd defaults to using /etc/machine-id, and not the MAC-address, so you'll need to consider that before launching multiple guests based off the same image.

Bridge multicast VLAN with a unicast VLAN

If you want your VLAN to reach beyond your multicast range, there are many options. A proper VPN solution is usually the best choice for robust permanent setups.

However, QEMU can wrap Ethernet frames in both multicast and unicast UDP packets. The latter can be used for VLANs across WAN, for example. where guest 1 could open firewalls and listen and guest 2 could connect. This would require no TAP devices or external tools. It would limit the VLAN to 2 guests, however.

With our existing multicast setup though, it's fairly straight-forward to bridge a UDP unicast connection to our multicast group. You can have 1 WAN guest can join the network of multiple LAN guests.

user@host $ socat -x UDP4-LISTEN:33001 UDP4-DATAGRAM:230.0.0.1:33000,reuseaddr,ip-add-membership=230.0.0.1:127.0.0.1,bind=230.0.0.1:33000

socat's UDP-LISTEN uses the receiver's address of the first packet as it's peer. This means we can only have 1 peer per invokation, which may be a good thing considering there is no security involved. For this reason, this must be restarted if the remote is changed.

Troubleshooting multicast membership

I often find myself suddenly no longer receiving the multicast packets I expect.

The interface (or listening address) in ip-add-membership must be correct, othersize socat won't be able to receive any packets - including ones coming from localhost. This informs the kernel and LAN that we want to receive packets for this multicast group.

However, if another process adds a membership, it will apply to the host as a whole so things will appear to work. You can use ip maddr to see current multicast subscriptions.

Performance

I ran a guest with 5 NICs to test their speeds:

qemu-system-x86_64 -accel kvm -m 1G -drive if=virtio,file=arch.qcow2 -nographic \
 -nic user,model=virtio-net-pci,id=n2 \
 -netdev tap,id=n3,ifname=tap0,script=no \                                                                                          -device virtio-net-pci,netdev=n3,mac=52:54:00:11:22:03
 -netdev socket,id=n4,mcast=230.0.0.1:12345 \                                                                                       -device virtio-net-pci,netdev=n4,mac=52:54:00:11:22:04
 -netdev dgram,id=n5,remote.type=inet,remote.port=12345,remote.host=127.0.0.1,local.type=inet,local.host=127.0.0.1,local.port=12346 -device virtio-net-pci,netdev=n5,mac=52:54:00:11:22:05

I'm running iperf3 -c 10.0.x.2 on the guest where x is the appropriate subnet. I get:

	type	speed	helpers
n2	`user`	1Gbit/s
n3	`tap`	30Gbit/s	`ip a add 10.0.3.2 dev tap0`
n4	`mcast`	500Mbit/s	`socat UDP4-DATAGRAM:230.0.0.1:12345,bind=230.0.0.1:12345,… TUN:10.0.4.2/24,tun-type=tap,…`
n5	`dgram`	600Mbit/s	`socat UDP4-DATAGRAM:127.0.0.1:12346,bind=127.0.0.1:12345,… TUN:10.0.5.2/24,tun-type=tap,…`

Conclusion

With this combined, I have the setup I wanted. I don't know how performant my VLAN is with all these socat bridges, but it's more than enough for a bit of Age of Empires multipayer.

What I've come to realize is that since multicast packets are sent and received by every member, it becomes an "Ethernet hub" onto which everything else is connected:

                               ,- laptop QEMU VM1
                              /
                             / (Internet)
                            / 
       ,- socat mcast <=> UDP-LISTEN:33001
      /
======================================= multicast 230.0.0.1:33000
    \   \   \
     \   \   `- desktop2 QEMU VM3
      \   `- desktop1 QEMU VM2
       `- socat mcast <=> TAP
                            `- desktop1 dnsmasq

I didn't know multicast could be used like that. It's nice and flexible, and relatively efficient in that only multicast group members will see be exposed to all the traffic.

	#!/bin/sh

	# TODO: get opengl working (will this work... DirectX calls on guest > OpenGL calls on guest > OpenGL calls on host ??)
	qemu-system-x86_64 \
	-accel kvm -m 4G -smp cores=4 \
	-usb -device usb-tablet \
	-audio pa,model=virtio \
	-drive if=virtio,file=arch.qcow2 \
	"$@"

	# -nic user,model=virtio-net-pci,hostfwd=tcp::8022-:22 \

	/arch.base.zst.qcow2
	/arch.qcow2
	/arch.sx.zst.qcow2