This describes Mike Gerdts' suggestion for how to complete the work on OS-6632. It is an expansion of this comment.
The OS-6632 branch
has a prototype fix that demonstrates that it is possible for the guest to
recognize disk size changes when a zvol changes size. To detect the size
change, the bhyve process has an mevent that fstat()
s each virtio-blk device
every 5 seconds. For an event that is unlikely to ever happen in the life of a
particular VM, this is rather extreme.
A better approach for alerting the bhyve process of device size changes is needed. The first priority should be development of interfaces that are portable between SmartOS and FreeBSD. Secondarily, the experience on SmartOS may be optimized.
Currently there is no way for a process in a zone to be automatically made aware of a device size change. How a device size change happens will be dependent on at least the following:
- Operating System
- Backing store type (disk, zvol, file, lofi, qcow, etc.)
- Priviliges/capabilities of the bhyve process (e.g. bhyve can't listen for sysevents in a zone)
A generic mechanism is needed so that arbitrary user-space utilities may alert
the bhyve process that it needs to perform a size check. The most obvious place
for this to happen is as an extension to bhyvectl
and the related ioctl
interface.
Existing vmm
ioctl
calls either operate on state that exists within the
vmm
module or is associated with vcpu state stored in the bhyve process. When
vmm
needs vcpu state from the bhyve process, it injects a vmexit into the
appropriate vcpu. What is needed here is different - there's no need to
interrupt a vcpu thread to notify the bhyve process of a disk change.
I propose the introduction of an event delivery mechanism that serves the needs of disk resizes and can be readily extended to other needs.
The soution includes:
- A generic event delivery mechanism that allows
vmm
to communicate events tobhyve
. - A disk resize event type.
- An enhancement to
bhyvectl
to say check the size of device X. - A SmartOS specific enhancement that allows automatic size change detection for some backing stores.
A new ioctl, VM_GET_EVENT
will be added. It is only valid on minors
associated with a particular VM, not on VMM_CTL_MINOR
.
typedef enum vm_event_type_t {
VM_EVENT_FOO,
/* Insert others here */
VM_EVENT_LAST
} vm_event_type_t;
typedef struct vm_event {
size_t vme_size;
vm_event_type_t vme_type;
// XXX Maybe add a timestamp for debugging
} vm_event_t;
void vmm_event_add(void *event);
The vmm
module will maintain a ring buffer containing event pointers. An
event is added to the ring buffer with:
vm_event_foo_t *ev;
ev = kmem_zalloc(sizeof (*ev), KM_SLEEP);
ev->vmef_event.vme_size = sizeof (*ev);
ev->vmef_event.vme_type = VM_EVENT_FOO;
ev->vmef_val = 42;
vmm_event_add(ev);
The bhyve
process will have a thread that does something like the following:
void *
vmm_event_thread(void *fdp)
{
int fd = *fdp;
uchar_t data[XXX_LARGEST_EVENT_POSSIBLE];
vm_event_t *event = (vm_event_t *)data;
int err;
while (!exiting) {
err = ioctl(fd, VM_GET_EVENT, data, sizeof (data));
if (err != 0) {
// Handle error
continue;
}
switch (event->vme_type) {
case VM_EVENT_FOO:
handle_event_foo(event);
break;
default:
// Handle error
}
}
}
The ioctl
will block until an event is available, the bhyve
process is
exiting, or the vmm
instance is being torn down.
Within the kernel, vmm_handle_get_event()
(a new function) will watch the
event ring buffer for new entries. At most one event will be returned with each
ioctl call.
A disk resize event adds VM_EVENT_DISKRESIZE
to vm_event_type_t
.
typedef enum vm_event_type_t {
VM_EVENT_DISKRESIZE,
/* Insert others here */
VM_EVENT_LAST
} vm_event_type_t;
typedef struct vm_event_disksize {
vm_event_t vmed_event;
/*
* XXX TBD - something to uniquely identify the backing store. The FD?
*/
} vm_event_disksize_t;
Now, when vmm
becomes aware of a disk resize it will queue a event with
vm_event_disksize_t
.
bhyvectl
will be enhanced to support:
bhyvectl --vm=<vm> --notify-disk-resize=<path-to-backing-store>
With that command, bhyvectl
will open the appropriate minor device and call:
// XXX not sure if passing the path is the best approach here.
ioctl(fd, VM_NOTIFY_DISKRESIZE, path, strlen(path));
As described in the previous section, now vmm_handle_notify_diskresize()
will
queue the appropriate event.
When the backing store is a device, it may be feasible to add automatic size change detection. This would involve the following changes:
spec_size_invalidate()
would issue a sysevent saying that the device size has been invalidated.vmm
would have an in-kernel sysevent listener that would listen for the events emitted byspec_size_invalidate()
. When a relevant sysevent is received, a vmm event is generated.
If the in-kernel sysevent listener is not feasible, it would be quite
straight-forward to enhance vminfod
to listen for the sysevent and invoke
bhyvectl
when relevant sysevents are received.