Last active
November 15, 2022 01:01
-
-
Save wolfwood/5447920 to your computer and use it in GitHub Desktop.
XOmB Activations: as they stand
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
============= Interrupt Primer ===================================== | |
Computer: a deterministic monolith crunching away until infinity one | |
instruction after the other, as preordained by The Programmer. | |
If this is not your experience, it is because of interrupts (NB: it is | |
NOT because we don't write infinite loops, they are hidden down in the | |
bottom of most OSes and GUI and console applications). The idea | |
behind interrupts is that we can pause what the CPU is doing, handle | |
some new information (like key presses or the printer finishing a | |
document), and return to the initial task. Interrupts are the Great | |
Nondeterminism of the computing world (what they trigger is also | |
predetermined, but when is dynamic). | |
what causes interrupts: I/O and timers --- outside state changes | |
alternative: polling :( | |
Current hardware and OSes have taken the idea of interrupts to an | |
extreme: because they can restore the state of the CPU prior to the | |
interrupt without the interrupted process noticing, they do so with | |
extreme prejudice. | |
virtual CPU: never not know you aren't running | |
alternative++: activations | |
============= How XOmB implements Activations ====================== | |
An activation is a piece of memory used to communicate between the | |
kernel and userspace. Currently all environments are created with a | |
2MB segment at address 1GB - 2MB, the only memory accessible in the | |
first 1GB. XOmB currently manages allocating this memory, but if we | |
move to userspace allocation this will become quite tricky unless we | |
can guarantee that it is possible for a correct program to always have | |
a free page available for allocation activations (and then anything | |
that faults in this segment is labeled incorrect and killed :). | |
The Activation, for simplicity, contains an InterruptStack struct (to | |
store the CPU state that is suspended on an interrupt) along with some | |
additional information for unwinding activations that occur during the | |
restoration of an activation (chained activations are my primary worry | |
regarding correctness, races and allocation issue), and finally a bool | |
to indicate whether the activation is valid, so that the kernel can | |
find a free activation to use when needed. | |
The underlying XOmB interrupt mechanic is not changed by activations, | |
the same templated code pushes registers to the interrupt stack but | |
instead of calling an interrupt handler, the activation dispatcher (an | |
un-scheduler if you will) is called. the dispatcher first finds a | |
free activation (XXX: in a lock-free manner that marks it as no longer | |
free) in the environments activation segment. Then the saved state of | |
the InterruptStack is copied to the activation. The interrupt is | |
acknowledged to the local APIC (to prevent denial of service) and | |
userspace is reentered using the same mechanism as initial entry and | |
the yield system call, with 2 parameters: an entry index of 4 and the | |
address of the activation used. | |
It may seem like it would be possible to avoid this copy by using the | |
interrupt stack AS the activation. The down sides of this approach are | |
a) need a whole page or more as the activation instead of ~100bytes b) | |
the activation must be read only to prevent corruption by userspace | |
code running on an adjacent CPU of an in-use kernel stack c) that | |
either the activation must remain kernel allocated, or we must edit | |
the ISR in the TSS on context switch and manage a race with any | |
interrupts that occur after we enter an environment but before we've | |
located a preallocated activation page that is free. | |
Because the only interrupt at the moment is a timer, userspace | |
currently uses the parameters passed from the kernel to call the | |
_entry function which restore the registers saved by the common | |
interrupt handler and then uses iretq to restore the hardware saved | |
registers and the RSP and RIP atomically. and this point it is too | |
late to mark the activation as free, so we currently leak activations. | |
while it may seem like the way to cure leaks is to begin with not | |
using iretq, restoring RSP and RIP without iretq is nearly impossible | |
because all registers will be occupied with application data and the | |
application stack cannot be assumed to be free below the pointer | |
(redzone optimization) but an indirect mov must be used to restore the | |
RIP (a preallocated address would risk overwrite from other CPUs also | |
restoring activations). It was theoretically possible to work around | |
this by storing the activation address in FSbase segment register and | |
doing FS relative addressing but this adds the FSbase register to the | |
state that must be preserved for userspace and complicates chained | |
activations. | |
So what about interrupts that we actually want to handle? For | |
throughput oriented workloads it may be reasonable to simply note that | |
the interrupt occurred, either by editing a bitmap that is checked | |
periodically by a 'process interrupts' thread (what if we want more | |
than one CPU to be able to process interrupts?) or by enqueuing a | |
preallocated thread to run the handler for the particular interrupt at | |
hand (what if we get two interrupts before the thread is scheduled?). | |
However, at least some interrupts will take priority over the | |
currently running thread. In this case we may need to allocate new | |
thread to handle the interrupt and to either allocate a thread to | |
restore the activation (as the activation may be in the stackless | |
thread scheduler code, or an interrupted activation recovery itself, | |
we cannot assume there is an existing thread to be added to the | |
scheduler and in any case an alternate 'enter from activation' would | |
need to be communicated). | |
Either path is sticky, and complicated farther by the fact that we | |
would ultimately like to be passing the interrupt initially to the | |
init process, so that the interrupt may be routed to another | |
environment entirely for quick handling without denial of service by | |
the present environment, but of course an activation that is not | |
immediately communicated to the suspended environment is no better | |
than a standard UNIX 'virtual CPU' that can be revoked without | |
warning. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment