Most APIs which accept binary data need to ensure that the data is not modified while they read from it. (Without loss of generality, let's only analyze ArrayBuffer
instances for now.) Modifications can come about due to the API processing the data asynchronously, or due to the API processing the data on some other thread which runs in parallel to the main thread. (E.g., an OS API which reads from the provided ArrayBuffer
and writes it to a file.)
On the web platform, APIs generally solve this by immediately making a copy of the incoming data. The code is essentially:
function someAPI(arrayBuffer) {
arrayBuffer = arrayBuffer.slice(); // make a copy
// Now we can use arrayBuffer, async or in another thread,
// with a guarantee nobody will modify its contents.
}
But this is slower than it could be, and uses twice as much memory. Can we do a zero-copy version?
One solution is for such APIs to transfer the input:
function someAPI(arrayBuffer) {
arrayBuffer = arrayBuffer.transfer(); // take ownership of the backing memory
// Now we can use arrayBuffer, async or in another thread,
// with a guarantee nobody will modify its contents.
}
But this can be frustrating for callers, who don't know which APIs will do this, and thus don't know whether passing in an ArrayBuffer
to an API will give up their own ownership of it.
This gist explores a solution which has the following properties:
- It requires the caller to do a one-time transfer of the
ArrayBuffer
to the callee, via explicit call-site opt-in. - Callees do need to do a small amount of work to take advantage of this, but the code to do that work is generic and could be generated automatically. (E.g. by Web IDL bindings, on the web.)
In this world, the default path, where you just call someAPI(arrayBuffer)
, still does a copy. This means the caller doesn't have to worry about whether they're allowed to continue using arrayBuffer
or not. I think this is the right default given how the ecosystem has grown so far.
function someAPI(arrayBuffer) {
// This line could be code-generated generically for all ArrayBuffer-taking APIs.
arrayBuffer = ArrayBufferTaker.takeOrCopy(arrayBuffer);
// Nobody else can modify arrayBuffer. Do stuff with it, possibly asynchronously
// or in native code that reads from it in other threads.
}
const arrayBuffer = new ArrayBuffer(1024);
someAPI(arrayBuffer); // copies
const arrayBuffer2 = new ArrayBuffer(1024);
someAPI(new ArrayBufferTaker(arrayBuffer2)); // transfers
The implementation of ArrayBufferTaker
can be done today, and is in the attached file.
- How to make this work ergonomically for cases where
someAPI
takes a typed array orDataView
? - Probably
arrayBuffer2.take()
or some better-named method would be more ergonomic thannew ArrayBufferTaker(arrayBuffer2)
- Probably in general we should come up with better names. This is an important paradigm and using the right names and analogies is key.
- Can we let
someAPI
release the memory back to the caller? That would require language support. - How does this interact with
SharedArrayBuffer
s, resizableArrayBuffer
s, and growableSharedArrayBuffer
s?- Probably this is just not applicable to
SharedArrayBuffer
cases. Those are explicitly racey. - Maybe it just works for resizable
ArrayBuffer
s?
- Probably this is just not applicable to
Thanks to @jasnell for inspiring this line of thought via whatwg/fetch#1560. Thanks to the members of the "TC39 General" Matrix channel for a conversation that spawned this idea, especially @mhofman who provided the key insight: a two-step create-taker then take procedure, instead of attempting to do this in one step.
I guess this approach requires the caller to opt in, which means that only apps which are hyper-optimized will actually use the API. The vast majority probably won't bother. It's better than doing nothing, but is there a solution that can "just work" without any code changes on the caller side?
I think a copy-on-write mechanism could achieve this.
Between
getImmutableView()
, andview.release()
, if anything else tries to modify the source buffer, it would trigger a copy.Implementing this would require support from the JS engine, although a polyfill could fall back to always making a copy (which is the status quo today anyway).
I don't know much about JS engine internals. But, given that ArrayBuffer access already has to check for detachment today, I would guess that there wouldn't be any general performance penalty to check for COW at the same time -- I imagine it would be treated like a special case of detachment, where the buffer can be reconstructed to fulfill the request.