For domain teardown, things work roughly like this:
- Domain stops execution for whatever reason (shutdown, crash, etc.)
- Xen raises VIRQ_DOM_EXC which is a notification to xenstored.
- xenstored refreshes it's global idea of which domains are alive, sees that dom$X has transitioned to the Shutdown state, and fires the @releaseDomain watch
- Anyone who cares (blk/net-back, userspace daemons inc toolstack) starts cleaning up. Most importantly, unmapping various mappings.
- Toolstack evaluates the on_$FOO actions, and by default will clean up the domain
- Toolstack issues domain_kill() which is a long running hypercall (potentially minutes) which causes various cleanup actions in Xen.
- domain_kill() triggers VIRQ_DOM_EXC a second time, which fires @releaseDomain a second tim
Obviously, steps 4 thru 6 happen in parallel, and it can happen that step 6 "completes" before 4 does. This is why things are reference counted.
When the domain refcount actually drops to 0 (might be in domain_kill(); might be when a late step 4 actually drops the mapping), we finally call domain_destroy() which arranges for the domain/domid to be removed from the hashtable (can no longer use rcu_lock_domain_by_id()/etc) and when we're certain that all parallel hypercalls have completed, we finally free up the final datastructures.
domain_kill() will complete without calling domain_destroy() if other entities in the system are still holding mappings. In that case, various unmappings hypercall will drop a ref on the page being unmapped, which may free it (if the page's refcount dropped to 0), and may drop the domain's number of pages to 0.