Most APIs which accept binary data need to ensure that the data is not modified while they read from it. (Without loss of generality, let's only analyze ArrayBuffer
instances for now.) Modifications can come about due to the API processing the data asynchronously, or due to the API processing the data on some other thread which runs in parallel to the main thread. (E.g., an OS API which reads from the provided ArrayBuffer
and writes it to a file.)
On the web platform, APIs generally solve this by immediately making a copy of the incoming data. The code is essentially:
function someAPI(arrayBuffer) {
arrayBuffer = arrayBuffer.slice(); // make a copy
// Now we can use arrayBuffer, async or in another thread,
// with a guarantee nobody will modify its contents.
}
But this is slower than it could be, and uses twice as much memory. Can we do a zero-copy version?
One solution is for such APIs to transfer the input:
function someAPI(arrayBuffer) {
arrayBuffer = arrayBuffer.transfer(); // take ownership of the backing memory
// Now we can use arrayBuffer, async or in another thread,
// with a guarantee nobody will modify its contents.
}
But this can be frustrating for callers, who don't know which APIs will do this, and thus don't know whether passing in an ArrayBuffer
to an API will give up their own ownership of it.
This gist explores a solution which has the following properties:
- It requires the caller to do a one-time transfer of the
ArrayBuffer
to the callee, via explicit call-site opt-in. - Callees do need to do a small amount of work to take advantage of this, but the code to do that work is generic and could be generated automatically. (E.g. by Web IDL bindings, on the web.)
In this world, the default path, where you just call someAPI(arrayBuffer)
, still does a copy. This means the caller doesn't have to worry about whether they're allowed to continue using arrayBuffer
or not. I think this is the right default given how the ecosystem has grown so far.
function someAPI(arrayBuffer) {
// This line could be code-generated generically for all ArrayBuffer-taking APIs.
arrayBuffer = ArrayBufferTaker.takeOrCopy(arrayBuffer);
// Nobody else can modify arrayBuffer. Do stuff with it, possibly asynchronously
// or in native code that reads from it in other threads.
}
const arrayBuffer = new ArrayBuffer(1024);
someAPI(arrayBuffer); // copies
const arrayBuffer2 = new ArrayBuffer(1024);
someAPI(new ArrayBufferTaker(arrayBuffer2)); // transfers
The implementation of ArrayBufferTaker
can be done today, and is in the attached file.
- How to make this work ergonomically for cases where
someAPI
takes a typed array orDataView
? - Probably
arrayBuffer2.take()
or some better-named method would be more ergonomic thannew ArrayBufferTaker(arrayBuffer2)
- Probably in general we should come up with better names. This is an important paradigm and using the right names and analogies is key.
- Can we let
someAPI
release the memory back to the caller? That would require language support. - How does this interact with
SharedArrayBuffer
s, resizableArrayBuffer
s, and growableSharedArrayBuffer
s?- Probably this is just not applicable to
SharedArrayBuffer
cases. Those are explicitly racey. - Maybe it just works for resizable
ArrayBuffer
s?
- Probably this is just not applicable to
Thanks to @jasnell for inspiring this line of thought via whatwg/fetch#1560. Thanks to the members of the "TC39 General" Matrix channel for a conversation that spawned this idea, especially @mhofman who provided the key insight: a two-step create-taker then take procedure, instead of attempting to do this in one step.
@kentonv, I raised that argument in plenary. I think the current security analysis relies on fixed pointers after allocation, and somehow detached-ness doesn't impact that, or has known impact proven to at worst result in incorrect JS execution (not in a compromise of the sandbox). I still believe that CoW would present the same risks as detached buffer in case of a bug in the implementation. It's possible v8/Chrome may be willing to re-evaluate the risk posed by transparent CoW if provided with an pull request and design doc, which I clearly don't have time or expertise to work on (FYI if anyone wants to take that on).
That said the taker mechanism here is still valuable for explicit borrow semantics. And if the caller does not provide a taker but only an ArrayBuffer, the host API implementation would fallback to a copy, which may or may not be CoW in the JS engine implementation.