Skip to content

Instantly share code, notes, and snippets.

@tdfischer
Created January 15, 2015 20:10
Show Gist options
  • Save tdfischer/f26179b3556c527acfd8 to your computer and use it in GitHub Desktop.
Save tdfischer/f26179b3556c527acfd8 to your computer and use it in GitHub Desktop.

Codius Sandbox Eventing

At some point, a sandboxed process will try to run a syscall that interacts with IO, such as:

open(const char* path, int mode)

Within our sandbox, the following events happen:

  1. The process calls libc's open call
  2. Libc translates this to system(SYS_open, path, mode) using the constants in /usr/include/asm/unistd_64.h
  3. system() sets up some registers and causes a soft interrupt per the Linux syscall ABI
  4. The kernel's interrupt vector catches the interrupt and uses the ABI specifications to read register values and figure out the call being used
  5. The call's data is pushed through the seccomp filter
  6. If the filter's end result is SCMP_ACT_TRACE, the process is stopped with a SIGTRAP, which notifies the parent process via SIGCHLD that this tate transition has occurred.
  7. The sandbox inspects the subordinate's registers to figure out what syscall it was using, in this case 3 (SYS_open)
  8. The arguments are extracted from the child process and passed off to the VFS layer for emulation
  9. The VFS layer determines which virtual filesystem will handle the request
  10. The selected filesystem is requested to generate a backend-specific file descriptor
  11. The VFS layer maps this file descriptor to a virtual file descriptor suitable for use by the subordinate process
  12. The subordinate process' registers are updated, setting the rax register as the return value. In the case of open(), the return value is a non-negative integer for a file descriptor, or a negative integer for an error.
  13. ptrace(PTRACE_CONT, pid, 0, 0) is called with the subordinate process' PID to transition the child to a running state that will use the previously written register values as though the true Linux kernel had handled the call.

In NodeJS, the method to open a file is through fs.open(path, flags, [mode], callback). Nothing is returned from the function, as it implements the continuation pattern. One must supply a callback that is executed at a later time than the initial fs.open() call.

As a result of this, step 10 must be modified:

  1. The selected filesystem is requested to generate a backend-specific file descriptor
  2. The selected backend notes the request and begins processing it in the background
  3. The backend later notifies the VFS layer when it has constructed a file descriptor and the sandbox is able to continue to step 11.

Eventing provides the mechanism to split step 10 into these two sub-steps. The initiation and completion of "open a file" are distinct operations. In this eventing model, the sandbox would emit a "opening a file has been requested" event with some metadata to help the sandbox's user (in this case, the nodejs bindings) complete the operation asynchronously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment