Skip to content

Instantly share code, notes, and snippets.

@domenic
Last active February 1, 2023 17:17
Show Gist options
  • Save domenic/a9343fa787ba54b4ba3a60882c49cc32 to your computer and use it in GitHub Desktop.
Save domenic/a9343fa787ba54b4ba3a60882c49cc32 to your computer and use it in GitHub Desktop.
Generic zero-copy ArrayBuffer

Generic zero-copy ArrayBuffer usage

Most APIs which accept binary data need to ensure that the data is not modified while they read from it. (Without loss of generality, let's only analyze ArrayBuffer instances for now.) Modifications can come about due to the API processing the data asynchronously, or due to the API processing the data on some other thread which runs in parallel to the main thread. (E.g., an OS API which reads from the provided ArrayBuffer and writes it to a file.)

On the web platform, APIs generally solve this by immediately making a copy of the incoming data. The code is essentially:

function someAPI(arrayBuffer) {
  arrayBuffer = arrayBuffer.slice(); // make a copy

  // Now we can use arrayBuffer, async or in another thread,
  // with a guarantee nobody will modify its contents.
}

But this is slower than it could be, and uses twice as much memory. Can we do a zero-copy version?

One solution is for such APIs to transfer the input:

function someAPI(arrayBuffer) {
  arrayBuffer = arrayBuffer.transfer(); // take ownership of the backing memory

  // Now we can use arrayBuffer, async or in another thread,
  // with a guarantee nobody will modify its contents.
}

But this can be frustrating for callers, who don't know which APIs will do this, and thus don't know whether passing in an ArrayBuffer to an API will give up their own ownership of it.

This gist explores a solution which has the following properties:

  • It requires the caller to do a one-time transfer of the ArrayBuffer to the callee, via explicit call-site opt-in.
  • Callees do need to do a small amount of work to take advantage of this, but the code to do that work is generic and could be generated automatically. (E.g. by Web IDL bindings, on the web.)

In this world, the default path, where you just call someAPI(arrayBuffer), still does a copy. This means the caller doesn't have to worry about whether they're allowed to continue using arrayBuffer or not. I think this is the right default given how the ecosystem has grown so far.

What it looks like in practice

function someAPI(arrayBuffer) {
  // This line could be code-generated generically for all ArrayBuffer-taking APIs.
  arrayBuffer = ArrayBufferTaker.takeOrCopy(arrayBuffer);

  // Nobody else can modify arrayBuffer. Do stuff with it, possibly asynchronously
  // or in native code that reads from it in other threads.
}

const arrayBuffer = new ArrayBuffer(1024);
someAPI(arrayBuffer); // copies

const arrayBuffer2 = new ArrayBuffer(1024);
someAPI(new ArrayBufferTaker(arrayBuffer2)); // transfers

The implementation of ArrayBufferTaker can be done today, and is in the attached file.

Open questions

  • How to make this work ergonomically for cases where someAPI takes a typed array or DataView?
  • Probably arrayBuffer2.take() or some better-named method would be more ergonomic than new ArrayBufferTaker(arrayBuffer2)
  • Probably in general we should come up with better names. This is an important paradigm and using the right names and analogies is key.
  • Can we let someAPI release the memory back to the caller? That would require language support.
  • How does this interact with SharedArrayBuffers, resizable ArrayBuffers, and growable SharedArrayBuffers?
    • Probably this is just not applicable to SharedArrayBuffer cases. Those are explicitly racey.
    • Maybe it just works for resizable ArrayBuffers?

Acknowledgments

Thanks to @jasnell for inspiring this line of thought via whatwg/fetch#1560. Thanks to the members of the "TC39 General" Matrix channel for a conversation that spawned this idea, especially @mhofman who provided the key insight: a two-step create-taker then take procedure, instead of attempting to do this in one step.

class ArrayBufferTaker {
#ab;
constructor(ab) {
// Using https://github.com/tc39/proposal-arraybuffer-transfer
this.#ab = ab.transfer();
// Or if you want something that works today:
// this.#ab = structuredClone(ab, { transfer: [ab] });
}
take() {
const ab = this.#ab;
if (!ab) {
throw new TypeError("Cannot take twice");
}
this.#ab = null;
return ab;
}
static takeOrCopy(abOrTaker) {
if (#ab in abOrTaker) {
return abOrTaker.take();
}
return abOrTaker.slice();
}
}
@mhofman
Copy link

mhofman commented Dec 9, 2022

@kentonv I agree and I suggested a copy-on-write optimization both during plenary (notes not yet published) and argued for it again in the TC39 Matrix channel. However implementers seem to have security concerns about copy-on-write, currently documented on the explainer of the transfer proposal.

We would definitely need an ab.detach() as to not rely on the GC behavior. The "view" would probably need to be a regular ArrayBuffer since consumers usually need their own TypeArray view on top. Immutable / read-only clones would be great addition, but not strictly necessary for this. With the existing .slice(), the engine could create a new ArrayBuffer instance that is backed by the same memory as the original, with a copy-on-write guard added to both instances.

@kentonv
Copy link

kentonv commented Dec 10, 2022

Hmm, I don't really understand the security risk argument. By the same argument, is detaching an ArrayBuffer not also a security risk, since the memory that the ArrayBuffer previously pointed to now no longer belongs to it?

@mhofman
Copy link

mhofman commented Feb 1, 2023

@kentonv, I raised that argument in plenary. I think the current security analysis relies on fixed pointers after allocation, and somehow detached-ness doesn't impact that, or has known impact proven to at worst result in incorrect JS execution (not in a compromise of the sandbox). I still believe that CoW would present the same risks as detached buffer in case of a bug in the implementation. It's possible v8/Chrome may be willing to re-evaluate the risk posed by transparent CoW if provided with an pull request and design doc, which I clearly don't have time or expertise to work on (FYI if anyone wants to take that on).

That said the taker mechanism here is still valuable for explicit borrow semantics. And if the caller does not provide a taker but only an ArrayBuffer, the host API implementation would fallback to a copy, which may or may not be CoW in the JS engine implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment