At some point, a sandboxed process will try to run a syscall that interacts with IO, such as:
open(const char* path, int mode)
Within our sandbox, the following events happen:
- The process calls libc's
open
call - Libc translates this to system(SYS_open, path, mode) using the constants in /usr/include/asm/unistd_64.h
system()
sets up some registers and causes a soft interrupt per the Linux syscall ABI- The kernel's interrupt vector catches the interrupt and uses the ABI specifications to read register values and figure out the call being used
- The call's data is pushed through the seccomp filter
- If the filter's end result is
SCMP_ACT_TRACE
, the process is stopped with a SIGTRAP, which notifies the parent process viaSIGCHLD
that this tate transition has occurred. - The sandbox inspects the subordinate's registers to figure out what syscall it was using, in this case 3 (
SYS_open
) - The arguments are extracted from the child process and passed off to the VFS layer for emulation
- The VFS layer determines which virtual filesystem will handle the request
- The selected filesystem is requested to generate a backend-specific file descriptor
- The VFS layer maps this file descriptor to a virtual file descriptor suitable for use by the subordinate process
- The subordinate process' registers are updated, setting the
rax
register as the return value. In the case ofopen()
, the return value is a non-negative integer for a file descriptor, or a negative integer for an error. ptrace(PTRACE_CONT, pid, 0, 0)
is called with the subordinate process' PID to transition the child to a running state that will use the previously written register values as though the true Linux kernel had handled the call.
In NodeJS, the method to open a file is through fs.open(path, flags, [mode], callback)
. Nothing is returned from the function, as it implements the continuation pattern. One must supply a callback that is executed at a later time than the initial fs.open()
call.
As a result of this, step 10 must be modified:
- The selected filesystem is requested to generate a backend-specific file descriptor
- The selected backend notes the request and begins processing it in the background
- The backend later notifies the VFS layer when it has constructed a file descriptor and the sandbox is able to continue to step 11.
Eventing provides the mechanism to split step 10 into these two sub-steps. The initiation and completion of "open a file" are distinct operations. In this eventing model, the sandbox would emit a "opening a file has been requested" event with some metadata to help the sandbox's user (in this case, the nodejs bindings) complete the operation asynchronously.