This gist describes a possible variation to Interface Types. It starts with a description and overview, and also contains a full working example of the approach.
This idea is specific to how modules using Interface Types are connected. It proposes no changes to the Interface Types themselves, and does not require any changes in the trusted role of the VM in Interface Types. The only core change proposed here is an alternative to adapter functions. Specifically, we propose not adding adapter modules, adapter functions, or adapter fusion, and instead connecting modules in a different way. For terminology, call the existing approach Fusion of Adapter Functions (FOAF), and as the new approach focuses on inlining, we call it Inlining Of Iterators (IOI).
Adapter fusion is an operation that generates code at link time. The idea behind IOI is to consider what things might look like if we instead add a more general way to generate code at link time, and build on that. This appears to end up with several interesting tradeoffs both ways.
There is a wide range of ideas around inlining, from a performance hint to Turing-complete macros that receive input from outside of the module. IOI requires the ability to inline at link time, which could be accomplished in various ways, perhaps the simplest of which is a performance hint, parallel to Branch Hinting: an optional note on a function that mentions it is worth inlining at link time. Interpreters and baseline compilers might simply ignore the hint, while optimizing compilers would be expected to inline and optimize during link.
In general, code generation in wasm looks like this:
- Wasm VMs typically compile at compile time (and then the link stage is very fast).
- Wasm VMs could also do some compilation at link time. At that time we know what we have linked to, which offers more optimization opportunities. This is what fusion does in FOAF, and the form of inlining that IOI depends on for speed.
- Wasm VMs could in principle also do some compilation at run time, like the JVM and CLR do: seeing which functions are hot, and then doing further optimizations there such as inlining.
Both FOAF and IOI only move wasm from 1 to 2 in that space. Also, both FOAF and IOI clearly state what is to be inlined at link time. As a result, in either approach wasm VMs can still be a lot simpler than the runtime compilation and dynamic decisions that 3 entails. (Of course, there may be other good reasons for VMs to implement 3; in that case, IOI could benefit from that and might not even need a hint as to which functions to inline, although the hint might still improve predictability.)
IOI can be done in more than one way. The simplest to explain may be
to start with the definition of adapter functions and work from there.
For example, list.lift
is defined
as this:
list.lift $List $done $liftElem $destructor? : [ T:<valtype>* ] -> [ $List ]
where:
- $List refers to a (list E:<intertype>)
- $done : [ T* ] -> [ done:i32, U* ]
- $liftElem : [ U* ] -> [ E, T* ]
- $destructor : [ T* ] -> [], if present
That is, it receives a list, an operation to figure out if the iteration
on the list's data is done, and an operation to get the next item. (There
is also a destructor, which we implement in a different way,
see later.)
Instead of using
the adapter fusion algorithm to combine the lift operations from one
side and the lowering operations on the other, $done
etc. could be
actual function references, called from actual wasm code. Then that
wasm code can simply be run, and in each iteration, it would get the
next item in the list and provide it to the other side. This is functionally
identical to what happens in FOAF. The question, of course, is how fast
this can be. That is a difficult question to answer without data, but
if the $done
etc. methods are all inlined, then it is plausible that
this could be quite fast. In particular, it definitely avoids the
extra copy problem that FOAF also avoids (but that serialization does
not). But even with that, FOAF provides some guarantees on performance
that IOI cannot. We will see later, however, that there are many
tradeoffs in this area, and speed benefits on both sides.
Leaving speed for later, let us write up some more concrete code for the first variation of IOI, called bridge-driven IOI because there is a bridge between the two modules which drives the conversation (the call, including the passing of data) between them. Here is python-like pseudocode for two modules, and a bridge between them, for a call that sends an f64 and returns a string:
# Target module
module target:
def startConversation():
# As we iterate over the output string, we will use this index.
target.currConversation.index = 0
# Return function references to our operations.
return (target.done, target.liftElem)
# The "application code" / "business logic" in this module. It
# knows nothing about interfacing with other modules - it just
# understands its own types.
def func(f64):
# A silly operation that returns some string for a given number
return str(f64 + 3.14159)
# Wrap around 'func' for interfacing purposes.
def func_wrapper(f64):
# Call the actual function, and prepare the result for being iterated on.
target.currConversation.str = func(f64)
target.currConversation.maxIndex = len(target.currConversation.str)
def done():
return target.currConversation.index == target.currConversation.maxIndex
def liftElem():
return target.currConversation.str[target.currConversation.index++]
def endConversation():
target.currConversation.str = None
# Bridge module for the Interface Type "f64 => string". Note how
# the f64 can just be passed as a core wasm type, while the string
# is iterated on.
module bridge:
def bridge(target_startConversation, target_func, target_endConversation, f64, source_lowerElem):
# Start the conversation on the target side, getting its operations.
target_done, target_liftElem = target_startConversation()
# Call the target function.
target_func(f64)
# Iterate on the output string. Note how this is functionally identical to
# FOAF, and how it only understands about Interface Types and not the
# internal details of either side.
while not target_done():
source_lowerElem(target_liftElem())
# End the conversation on the target.
target_endConversation()
# Source module
module source:
def func():
# Call the target function through the bridge. We provide the f64
# parameter and the operation for us to receive the output string's
# elements, which is basically `list.lower`'s `$lowerElem`.
source.startConversation()
bridge.bridge(target.startConversation, target.func_wrapper, target.endConversation,
2.1828,
source.lowerElem)
print(source.currConversation.str)
source.endConversation()
def startConversation():
source.currConversation.str = ''
def lowerElem(elem):
source.currConversation.str.append(elem)
def endConversation():
# Free the string.
source.currConversation.str = None
To understand how this works, it may be helpful to view these slides that show the control flow around the bridge in graph form. Another thing that can help is to see what the code looks like after inlining and constant propagation:
def func():
source.currConversation.str = ''
target.currConversation.index = 0
target.currConversation.str = str(2.1828 + 3.14159)
target.currConversation.maxIndex = len(target.currConversation.str)
while target.currConversation.index != target.currConversation.maxIndex:
source.currConversation.str.append(target.currConversation.str[target.currConversation.index++])
target.currConversation.str = None
print(source.currConversation.str)
source.currConversation.str = None
The string is transferred by a simple loop that iterates on the data,
avoiding an extra copy in the middle. In fact, if the two languages
here use, say, linear memory and GC, then (in addition to the
necessary incrementing) the loop could contain
basically a single i32.load
followed by a single array.set
, the actual low-level
operations for each language. (In the python-esque pseudocode, the
append
may look like a heavy operation, but it would just be
a write and an increment; and the ABI should probably send the
string size in advance, as mentioned in the
Interface Types Explainer.)
It may not be necessary, but if the VM also performs escape analysis (if the code uses wasm GC) or LICM (if the code uses linear memory) then the loop can look something like this:
while target_index != target_maxIndex:
source_str.append(target_str[target_index++])
That should in many cases run at optimal speed.
Note how we begin with a bridge that only understands the Interface Types involved, and two modules that each only understand their own internals, but after inlining and reasonable optimizations (as mentioned: constant propagation, and perhaps escape analysis or licm) we end up with fast code. While we would not have a guarantee of such a result, if it is achieved in the common case then it may be worth it for the benefits that IOI brings, which we will see later.
(Note that the idea of depending on inlining to make iterators and getters run at full speed is of course not a new one, as it is - heavily! - relied upon in languages like C++ and Rust, for example.)
Before getting into more details, it is worth mentioning some possible extensions and variations to IOI:
- The example above was called "bridge-driven" because the caller calls
into the "bridge", which then guides everything (in particular,
the loop to iterate on the string is in there, as well as the rest
of the conversation with the other module). A benefit to that model
is that, if we implement the bridge in the VM, then it can be trusted by
the calling modules to behave as expected. Another variation on IOI
that is worth considering is caller-driven, in which
bridge.bridge
would be implemented inside the calling module. The interesting tradeoff there is that the toolchain can potentially perform more optimization, but it also means that the called module cannot depend on the VM as we did before, at least not without some additional work; more on both later. - IOI can be generalized to a capability-based security approach, in which a call between two modules passes tokens to allow permission to access data. In the above example pseudocode we depended on the fact that there was one conversation active at a time; more generally, there would need to be a stack of conversation states. But we can generalize that further to allow for long-running conversations, or ones that overlap in various ways (that may also be possible in FOAF, if an adapter function contains a stack switching instruction to suspend), or one module receiving a capability token that it passes through to a third module, etc. etc. That is, to return a string, a module would provide a token that is used in calls to iterate on the string's contents. This basically extends capability-based security, already a foundation of WASI, from the system interface level to calls between individual modules.
The rest of this gist contains more details about performance, simplicity, and verification, and then a full working example of the approach (specifically, in the caller-driven variation, which is a little shorter to write up), runnable in a VM today (without link-time inlining, it will be slow today, of course).
Call "glue code" the code whose only purpose is to convert types between the modules. Usually this is code emitted by the toolchain, and distinct from the "application code" that the user writes. The application code doesn't know anything about interfacing with other modules, which is the glue code's purpose. The next subsections consider inlining of different types of code in different locations.
Consider the case of one module calling another, and the called function is a tiny getter, that is, where the call overhead is far larger than the work done in the function. Something like this:
module foo:
def simple_operation(x):
return x + 1
module bar:
def do_many_calls:
sum = 0
for i in range(1000000):
sum += foo.simple_operation(i)
return sum
IOI can inline application code across module boundaries. (In fact,
in the example earlier we inlined target.func
.)
While in general it may not be easy to know which imports are worth inlining,
at least in some cases that may be possible, such as simple_operation
in this section. As a result, IOI can be fast on that
example.
In the proposed component model in wasm we have shared-nothing linking, and calls between components therefore have two forms of overhead: copying overhead, where we need to copy data between them (and therefore it is important to focus on removing any extra copies, at least) and call overhead (the actual call itself, and missed opportunities in register allocation, constant propagation, etc.). The two forms will apply in different use cases, for the most part, as if the data is large the call overhead may not matter, and at other times, the call overhead may dominate.
If we want to make the component model as useful as possible then we should aim to reduce both forms of overhead as much as we can. Comparing to other ecosystems, copy overhead is not a problem in the JS, C++, Rust, Java, C#, etc. ecosystems, as memory is shared between libraries. Likewise, call overhead is mitigated in all of those, either because of dynamic inlining, or inlining from headers, or LTO, etc. The wasm component model does not have those benefits, and so both call and copy overhead may become an issue.
Using link-time inlining in IOI we can reduce both forms of overhead. Or, rather, IOI sees the problem of copying overhead as one that can be reduced to calls of getters/iterators across the module boundary, which is the problem of call overhead, and therefore if we solve one we solve both. (Of course, the approach of using link-time inlining is not guaranteed to be perfect, and we need to validate it on real-world data.) On the other hand, if all we have is FOAF, and it does not inline such application code, then it will only handle copying overhead but not call overhead, and will therefore be slower than IOI on some use cases. (Again, this would need to be investigated in real-world data.)
Let us consider ways in which FOAF could be competitive with IOI on the call overhead. We could imagine that we spec and ship both FOAF and a general feature to inline at link time, and use FOAF for glue code and the general feature for application code, but then we have specced and shipped two features to emit specialized code at link time.
Alternatively, as adapter functions
can contain arbitrary wasm, they could contain application code
as well as glue code. If we placed both foo.simple_operation
and
bar.do_many_calls
in adapter functions (note that we need both
sides) then they would get fused and the call overhead between
them removed. We would need to handle things like getting that code
access to the relevant global state they need (as adapter functions
appear in an adapter module, which might be separate from the
main code, that might not be trivial), but this seems possible.
It may, however, run a little counter to the spirit of
the Interface Types Explainer that says that
"the role of adapter code is to be a thin layer between core modules",
but it may be worth it for the speed.
If adapters only contain glue code, as perhaps implied in that last quote, then it would be an option to ignore them in debuggers, as only toolchain developers would really want to see them. (While we would complain, there are not that many of us...) But if we put application code in adapter functions, as suggested in the last paragraph, then issues arise with the layering of adapter functions on top of core wasm. While the core VM can only care about core wasm, which mitigates much of the security and other risks of adding adapter functions, that will require interaction with places that do care about the original form of the wasm binary, such as debugging. The issues may include:
- Noting the original binary address of wasm instructions, and piping that through fusion, to be used by the debugger later. This would be necessary for DWARF and source maps.
- Debuggers would need to support the viewing of adapter function code, and to step through it and so forth. This would include support for the new instructions only possible in adapter functions. This would be necessary when debugging without DWARF or source maps.
- A debugger that looks at the call stack might need additional work, as the inlining done during fusion would change things.
- DWARF would need to be extended to support adapter modules and adapter functions. (E.g., DWARF offsets are into the code section, so this may be trivial depending on the binary format of adapter functions.)
- Any tool that scans the wasm binary for some reason might need additional work. This may not be the case with current wasm debuggers, but it has been done in things like sanitizer integration in Emscripten in the past (but not the present).
In principle such issues can also happen in IOI. However, in IOI we have the ability to do nothing: to not inline, when running in the debugger, and then all we have is regular wasm which already runs in debuggers. This may be the natural thing to do, for the same reason that some debuggers run wasm in the lowest tier, which avoids doing the work to make the optimizing tier debugging-friendly. That is, a VM might only run the baseline compiler when debugging, and the baseline compiler might ignore the inlining hint.
In the caller-driven variation of IOI, the bridge code is in the calling module, which means the toolchain has the opportunity to optimize it together with the application code (as both are simply regular code in the calling module). For example,
module caller:
# Glue plus the bridge for a call that sends a list of integers to
# the called function.
def glue_plus_bridge(func, list):
target_lowerElem = start_conversation()
for i in range(len(list)):
target_lowerElem(list[i])
call(func)
end_conversation()
def application_code():
glue_plus_bridge(other_module.foo, [1,2,3,4])
First, note how the loop that iterates over the list
already contains list[i]
, that is, is specialized for
the language in the caller
module, which is possible
as all this is emitted by the same toolchain for the
same language. Second, note how
we call the other module with an array of fixed size (4).
The toolchain may inline and then unroll the loop, arriving at this:
module caller:
def application_code():
target_lowerElem = start_conversation()
target_lowerElem(1)
target_lowerElem(2)
target_lowerElem(3)
target_lowerElem(4)
call(func)
end_conversation()
It may be helpful to be more concrete. Here is some C++:
#include <vector>
extern "C" void lift(int x);
void lift_vec(const std::vector<int>& v) {
for (auto x : v) {
lift(x);
}
}
int main() {
lift_vec({424242, 1337, 21828});
}
That will lift a std::vector
to another module. An easy way to get fairly
readable code after running the LLVM optimizer is to let emcc
compile it first
to wasm and then to JS, using this:
emcc -O3 --profiling -s WASM=0 -s ERROR_ON_UNDEFINED_SYMBOLS=0
The result is this:
function main($0, $1) {
$0 = $0 | 0;
$1 = $1 | 0;
lift(424242);
lift(1337);
lift(21828);
return 0;
}
A perfect result!
To get there, LLVM did some nontrivial things, as the unoptimized code begins
with stores to memory of those three interesting integers, and even if the loop
is inlined into main
, there are still dozens of lines in between it and those
stores, lines which contain other reads and writes to memory (like setting the
array length, etc.). LLVM is capable of figuring this out, but a wasm VM
likely would not, both because LLVM simply does a lot more work, but also
because LLVM IR has a lot more information that helps with things like
alias analysis.
(There are many more possibilities here, some of which would require ABI changes. For example, a function that receives a list of integers and only looks at the last one could actually only read that single value, instead of iterating to the end, if the ABI were random-access instead of iteration.)
While these are not trivial optimizations, this would be done at the toolchain level, where LLVM and others can in fact do such things. In addition to this, after the VM inlines the other side, the early toolchain optimizations may enable further benefits, if the toolchain made the resulting wasm easier for the VM's simpler optimizer to handle.
In comparison, in FOAF the adapters are special code with a special form and role (so that the VM can fuse them together), which makes toolchain inlining of glue code with application code in the same module more difficult. While adapters are not quite a different language, they are distinct from normal wasm due to the restrictions on them, like having affine types, loop-parameter restrictions, etc. Perhaps most importantly, adapter functions need to be emitted in a different way (as adapter functions in an adapter module), and so it is not clear how we could just run a general-purpose optimizer on both adapter functions and other code together.
A general principle in optimization is that doing everything in a single uniform language is simpler, and it is best to avoid special modes (like strict mode in JS) and that while combining multiple languages can be very productive, it does inhibit optimizations across the language boundary (e.g., asm.js and JS, wasm and JS, JS and the browser's native code, C++ and a script language in a game engine, etc.). This metaphor is not precise here, but the principle of having a single format in which to optimize does play in IOI's favor.
As mentioned earlier, FOAF gives a guarantee of good performance on fused adapter functions, as it inlines in a predictable way and avoids adding additional overhead there. This is an advantage compared to IOI, in which we depend more on compiler optimizations. While IOI will not do any extra copying of data, it may have other additional overhead, and in the worst case a significant amount.
(How hard IOI is to optimize depends on the details. For example,
the pseudocode in the introduction used function references
extensively, which is necessary in bridge-driven IOI given the bridge must be
independent, but in caller-driven IOI we can inline the caller's
own operations at the toolchain level, and also call the other
module's imports directly instead of using references, making the code
a lot simpler. But, bridge-driven IOI may be equally fast as
turning a call_ref
of a constant function
into a direct call is a simple optimization.)
The flip side of FOAF's advantage in predictability is that it is focused on a very specific task: copy overhead between modules. It does not help with call overhead, as we saw earlier, and it does not help with anything else. In comparison, IOI does help with call overhead, and in addition, the work to make IOI fast will make other things faster as well as when a VM invests in improving things like constant propagation and escape analysis then that work can be useful on a broad range of code.
It is impossible to evaluate this tradeoff without real-world data, as it depends on what the future ecosystem of modules will look like, and in particular, how large the different types of overhead will be.
The above discussion focuses on peak performance from code generated in the optimizing tier of the VM. Baseline compilation (or, similarly, an interpreter) has other tradeoffs, which impacts startup times.
We have already mentioned that in IOI a baseline compiler can simply ignore the inlining hint. That would allow optimal startup speed (no new work at the link stage) in a very simple way (no changes would be needed in the baseline compiler). The downside is that cases that heavily benefit from inlining may be slower - potentially significantly slower than the usual 2x slowdown common in baseline compilers today. That is, we have the tradeoff of optimal startup for slower speed while we wait for the optimizing tier.
(Note that while the baseline compiler does not change, the bridge is still needed in the case of bridge-driven IOI. If the bridge is a regular wasm module, then it can be baseline-compiled at compile time normally, and so there is no work at link time. If, on the other hand, the VM provides the bridge, then it needs to do the work to generate it for the necessary Interface Types. But in this case as well, the bridge can be generated at compile time, as the bridge only depends on the used Interface Types signatures, and not the internal details of the modules on both sides.)
For comparison, in FOAF the VM must perform fusion at link time,
as fusion can only be done when both sides are present. Note
that FOAF's link-time overhead may not be linear.
If we have a set of modules using L
different languages, and T
different Interface Type signatures for calls
between them, then in FOAF we have T L^2
total adapters to fuse.
For example, for two languages Rust and Dart, and two signatures "send a string"
and "send an int", we must fuse to get all the 8 combinations of
"send a {string, int} from {Rust, Dart} to {Rust, Dart}".
(Note that with additional work FOAF could perhaps do something similar to IOI, that is, baseline compile each side's unfused adapter functions as well as a bridge, all at compile time, which would make FOAF behave very similarly to IOI. Operating on unfused code appears to be the only way to avoid non-linear work at link time.)
IOI requires less work in the VM than FOAF. The only strictly necessary new VM feature is link-time inlining; however, we may also want to add more to VMs eventually, such as the bridge, validation, etc. (Note that the bridge in bridge-driven IOI can be done in normal wasm code to begin with, with no performance downside or loss of functionality, while we consider how to spec it for the VM.) We may end up wanting to keep the role of the VM exactly the same as it is in FOAF - ensuring important properties - but we can do so with less work in the VM. That is because FOAF adds adapter modules, adapter functions, and fusion, which includes new entities, new instructions, and the fusion algorithm. In addition to the work to implement those things, in FOAF VMs will also need to decide how to handle interactions between adapter code and things like baseline compilation and the debugger, as mentioned earlier.
Doing less in VMs is generally preferable, all other things being equal, for security and other reasons, and is consistent with how wasm proposals have been done so far (i.e., when possible to implement something at the toolchain level we have usually opted to do so). Of course, we would need to actually see if "all other things being equal" holds here.
The simplicity of IOI can help Interface Types ship earlier and also be experimented with earlier. Once a wasm VM ships link-time inlining, IOI can be used on it with full speed. In FOAF, adapter fusion must be added to the VM for things to work at all. Polyfilling of FOAF is possible, but the result of the polyfilling is the fused code for two particular modules - those modules cannot be linked with other ones, which means we cannot gain the full benefits of an ecosystem of modules. With IOI once we have inlining we can link arbitrary modules (with the right Interface Types), and that may help avoid a situation where the component model is only truly experimented with on the server while the Web does not allow maximum reuse of components. IOI therefore enables an ecosystem of modules to appear earlier, which has the benefit of letting us learn from data and experience as we decide what to spec.
A final note: Simplicity does not only matter for VMs, but also for other tools in the ecosystem that parse and/or modify wasm, including static analyzers, instrumentation tools, optimizers, etc., such as wabt, wizer, binaryen, manticore, twiggy, wassail, wasabi, bloaty, etc., all of whom would need to handle FOAF's new additions.
The declaration of Interface Types in the wasm binary can be the same in FOAF and IOI, in a new mandatory section. (Variations of IOI also allow polyfilling that for now in a custom section.)
Bridge-driven IOI can have stronger guarantees than caller-driven IOI. If the bridge is part of the VM, then the called module is only called by the VM, and it can trust it to not do anything confusing like interleave conversations. That is, there would be a stack of conversations, just like there is a call stack, and so the called module could store its conversation state in a simple stack as well.
Caller-driven IOI can achieve similar guarantees with additional work. We could still require calling the other module through the VM, even if the VM does not drive the conversation - this would be a bridge, but a bridge that does not drive things. The VM would see that the calls to the called module make sense, in particular, that the conversations form a stack, and so forth. That would require dynamic checks in the bridge code. It is natural to expect that those checks would vanish after inlining and optimizations, as things would look like this:
# Code inlined from begin_conversation
currConversation = nextConversation++
# Code inlined from where the other side receives the conversation
# token (which should be an opaque value as seen from there).
other.token = currConversation
# The actual conversation.
while ..
# Code inlined from the lift and lower operations for a
# single item in the list.
# Before doing one lift/lower operation, check the token
# we receive as belonging to this conversation.
assert(other.token == currConversation)
...
Simple optimizations on that code will turn the assert into
assert(currConversation == currConversation)
which can then be eliminated.
Similar assertions can be added to verify that the right order of calls occurs for the iteration functions, etc., thereby ensuring that the calls comply with the declared Interface Types.
Optionally, more static type checking can be added by adding types to wasm and using
them as the tokens passed between modules. For example, we could
add a stringref
type, which is an opaque reference (that each side would
cast internally to their own type). Note that this is not
related to adding an actual string type to wasm, which would include
instructions to operate on it - here we just have a type for a reference to one,
which is basically just a slightly more specific externref
.
The benefit of just having such a type is that normal wasm type checking would
match that there is a stringref
on both sides (another benefit could be
when running in the debugger, as the type gives a clear hint as to how to
represent it to the developer).
In a GC language there might not need to be any special syntax, and one could just call:
String str = foo(f);
The toolchain would then emit the glue code if foo
is a function that is
imported from another module, and expected to use interfacing (there might need
to be some new syntax just for that). That is, the toolchain would emit the
calls to start and end the conversation automatically, right around the call.
For C and related languages, there is no automatic process that can determine when the string should be freed (the next line? the end of the block? the end of the function?), and so some amount of special syntax may be needed. But it could be very minor. One possible sketch:
start_conversation();
char* str = foo(f);
// ..do stuff with the string..
end_conversation();
// using the string here would be UB!
Other options in this space could use RAII in C++, ownership in Rust, etc., to determine when the conversation ends using the lifetime of the returned values.
The specific example task shown in the runnable code in this gist is to interface using this function signature:
(f64)=>string
That is, a function is called that receives an f64 number and returns a string.
- The string must be available to the caller to use, but also freed at an appropriate time.
- The "glue" code in each module must not need to be rewritten when we change the other module. That is, each of the modules should be swappable with a different module, using a possibly very different meaning of "string". Or, in other words, a module in language A is written once, and can then call or be called from modules written in languages B or C with no changes.
Four wasm modules are defined:
Name | Role | Language |
---|---|---|
calling-linear | Calls a module | Linear memory |
calling-gc | Calls a module | GC |
returning-linear | Is called, returns a value | Linear memory |
returning-gc | Is called, returns a value | GC |
Each of the "calling" modules can call either of the "returning" modules, and vice versa. This shows that each module does not need to know about the internal representation of the data (linear memory or GC) in the module it is linked to at runtime.
Note: A bridge between the modules is not shown in this demo.
$ ./build.sh
and then
$ d8 --experimental-wasm-gc test.js
testing calling-linear.wasm => returning-linear.wasm
L little
testing calling-linear.wasm => returning-gc.wasm
L few
testing calling-gc.wasm => returning-linear.wasm
G little
testing calling-gc.wasm => returning-gc.wasm
G few
"L" is used to show that linear memory is the calling module, and "G" for GC. The linear returning module returns "little", and the GC one "few", so the output shows that we successfully call between the modules in all 4 modes. That is, each module is tested against two others, showing that it is 100% independent of the implementation in the module it is linked to.
You can edit the files to experiment with things. For example the f64
that is
passed is currently 1.0
, and the called functions check if the input is less than 2.0
(and they return a different string based on that),
so by increasing that number you will see different results, etc.
The most heavily annotated file is calling-linear.wast
, so it might be a good
starting point, followed by returning-linear.wast
.
The JS to link modules together and call them is in test.js
. You can read
that to verify that each module is linked against two others, each with a very
different internal implementation of "string", but we can still call across
them.
Running ./test.sh
will build, run, and compare the output to the correct one.
As you read the code you may notice some details that are incidental to the example here, that is, that are not part of the IOI approach. IOI can be implemented in various ways, and choices had to be made to write a concrete example. This section clarifies some of those things.
- Wasm GC is used for the conversation tokens. That support
is used in the glue code, but of course not in the actual linear memory
code itself - that is, a linear memory string is still in linear memory, as
it has to be. Alternatively,
we could use
i32
as the token type everywhere, and then we'd need a table to match those IDs to the conversation info, etc. There are tradeoffs either way (do we want to depend on wasm GC, in return for the convenience and safety?), and IOI could support either. As it is shorter to write an example using wasm GC in the glue, this example does so.- Note that using a GC object as the conversation token is efficient, as it avoids the need for a mapping. In a sense the token is really a closure state, which is opaque outside of the current module.
- The particular ABI of a string here is as a sequence of i8s terminated by a 0. Of course that is not what we want more generally for unicode - I am not a unicode expert, so I didn't even try to make something "realistic". Perhaps it is simpler to think of this example as implementing the type "array of i8". Note also that this example could have provided a length with the string/array as opposed to null-terminating it. To keep the example code brief, the shortest approach was written.
This clearly leaves interface types completely as a toolchain phenomenon. TBH, I can't see any role for them in your scenario.