Skip to content

Instantly share code, notes, and snippets.

@prati0100
Last active October 1, 2025 07:32
Show Gist options
  • Select an option

  • Save prati0100/40b3a9485b12688460e5e7df72635d52 to your computer and use it in GitHub Desktop.

Select an option

Save prati0100/40b3a9485b12688460e5e7df72635d52 to your computer and use it in GitHub Desktop.
ALPSS 2025 presentation: "Kexec Handover and Live Update"

Kexec Handover and Live Update

1 Agenda

2 What is Live Update?

2.1 Agenda

2.2 What is Live Update?

  • It’s NOT: live patching, live migration.
  • Updates the kernel or hypervisor with minimal disruption for underlying workloads.
  • Most commonly used for hypervisors.
  • Can also be used by other workloads to reduce kernel patching downtime.
  • Multiple cloud providers working together to upstream it.

2.3 Notes

3 High level overview

3.1 Agenda

3.2 Live update flow

  • The system is in normal state.
  • The system software starts the live update process.
  • Serializes state keeping VMs active but with limited capabilities.
  • Pauses VMs and does final serialization.
  • Loads and next kernel and hands over the serialized data.
  • Next kernel deserializes the data.
  • Resumes VM, returning normal operation.

3.2.1 Notes

  • In “serialization” part, mention the role of system software and kernel.
  • Note the similarities to live migration.

3.3 What gets preserved?

  • VM metadata
  • VM memory
  • Passthrough devices
  • IOMMU mappings

3.3.1 Notes

4 Building blocks

4.1 Agenda

4.1.1 Notes

  • Complex feature, not easy to do it in one go.
  • Upstreaming as a set of building blocks instead.

4.2 Kexec Handover (KHO)

  • Creates a mechanism for kernel-to-kernel communication.
  • Provides mechanism to mark memory as preserved.
  • Makes sure preserved memory does not get used by the next kernel.
  • Passes this information over kexec.

4.2.1 Notes

  • Explain that it is not possible to preserve user memory using this.
  • But it can be used for non-liveupdate cases as well, like reserve_mem for example.

4.3 KHO: Memory preservation

int kho_preserve_folio(struct folio *folio);
int kho_unpreserve_folio(struct folio *folio);
struct folio *kho_restore_folio(phys_addr_t phys);

4.4 KHO: Memory preservation

4.5 KHO: Preparing

  • Before the system is ready for kexec, KHO must be notified so it can prepare.
  • On this notification, serializes preserved memory to bitmaps.

4.5.1 Notes

  • Mention that the finalization hook is going away.

4.6 KHO: Booting up

  • Pre-reserved scratch area for early boot.
  • Passing KHO metadata: setup data on x86, chosen node in FDT on arm64.
struct kho_data {
	__u64 fdt_addr;
	__u64 fdt_size;
	__u64 scratch_addr;
	__u64 scratch_size;
} __attribute__((packed));
chosen {
	linux,kho-fdt = <...>;
	linux,kho-scratch = <...>;
};

4.6.1 Notes

  • Mention that kexec image and all early boot allocations go in scratch.
  • Mention that chosen node gets set at kexec load time.

4.7 KHO: Booting up

  • On early boot, only allocate from scratch.
enum memblock_flags choose_memblock_flags(void)
{
	if (kho_scratch_only)
		return MEMBLOCK_KHO_SCRATCH;
	[...]
}
  • After early boot, mark preserved pages as reserved and turn off scratch-only mode
  • Reserved pages don’t get released to buddy allocator.

4.8 Live Update Orchestrator (LUO)

  • LUO provides a way for userspace to control the live update process.
  • Allows marking which resources to preserve.
  • Provides a state machine to co-ordinate all the components.
  • API is exposed through a set of IOCTLs.

4.8.1 Notes

  • Can’t preserve everything since too much state.
  • Mention that this is the next layer since it lets userspace actually do stuff.
  • Maybe mention that /dev/liveupdate can only be opened once and that luod must control it?

4.9 LUO: States

  • \textcolor{blue}{Normal}: No live update in progress.
  • \textcolor{blue}{Prepared}: Kernel is prepared to do a live update. Devices and resources operate in limited capacity.
  • \textcolor{blue}{Frozen}: The final reboot event has been sent. Last chance for the kernel to serialize.
  • \textcolor{blue}{Updated}: System has rebooted into next kernel and can start deserializing devices and resources.
  • \textcolor{blue}{Normal}: The system is back to normal functionality.

4.10 LUO: States

struct liveupdate_ioctl_set_event {
	__u32	size;
	__u32	event;
};
  • LIVEUPDATE_PREPARE: Normal -> Prepared
  • LIVEUPDATE_FREEZE: Prepared -> Frozen
  • LIVEUPDATE_FINISH: Updated -> Normal
  • LIVEUPDATE_CANCEL: Prepared -> Normal

4.10.1 Notes

  • Explain all the states.
  • FREEZE: Sent from reboot(2).

4.11 LUO: File Descriptors

  • Userspace can pass in supported file descriptors to LUO to mark them for preservation.
  • Not any arbitrary FD, only FDs for supported file types.
struct liveupdate_ioctl_fd_preserve {
	__u32		size;
	__s32		fd;
	__aligned_u64	token;
};

4.11.1 Notes

  • Give some examples of FDs in Linux: memfd, sockets, VFIO, IOMMUFD, KVM, etc.
  • Mention some properties that can change with restore FDs, taking memfd as example.
  • Mention that the token can be used to identify the FD after reboot.

4.12 LUO: Subsystems

  • For things that can’t be described by a FD.
  • Examples: PCI, NVME, ftrace, etc.

4.12.1 Notes

  • Mention that not much work done on this so use cases and usage model still unclear.

4.13 Memory File Descriptor (memfd)

  • memfd attaches a file descriptor to anonymous memory.
  • State preserved: memory contents, size and position.
  • After preserve, cannot add or remove pages from the memfd.
  • Limitations: no sparseness, no swap.

4.13.1 Notes

  • Mention that memfd is the first user of LUO.
  • Mention that pages are pinned and holes are filled.

4.14 memfd: preservation format

/ {
	pos = <0x...>;
	size = <0x...>;
	folios = [array of memfd_luo_preserved_folio]
};
struct memfd_luo_preserved_folio {
	u64 foliodesc;
	u64 index;
};
  • Foliodesc: bottom 12 bits for flags, rest for PFN.

4.14.1 Notes

  • Explain why we use FDT.

4.15 VFIO, PCI, IOMMU, etc…

5 Upstream status

5.1 Agenda

5.2 Upstream status

  • KHO is in mainline. See kernel/kexec_handover.c and include/linux/kexec_handover.h.
  • LUO v4 sent out few days ago. \color{blue}\underline{Patch posting}. It is starting to stabilize and is on path to upstream soon.
  • memfd support will get merged with the LUO patches.
  • RFCs for PCI, VFIO, IOMMU out.

6 Future work

6.1 Agenda

6.2 Future work

  • Supporting more subsystems: huge pages, VFIO, IOMMU, PCI, etc.
  • Implementing luod.
  • Improving performance for reboots.
  • Defining a mechanism for kernels to negotiate versions to enable rollback and roll forward to a wider set of kernels.
  • Testing and validation.

7

8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment