Skip to content

Instantly share code, notes, and snippets.

@kripken
Last active July 9, 2021 19:58
Show Gist options
  • Save kripken/e58c241790377e5ee03f93b8b5f3e20e to your computer and use it in GitHub Desktop.
Save kripken/e58c241790377e5ee03f93b8b5f3e20e to your computer and use it in GitHub Desktop.
Interface Types IOI

Interface Types Using Inlining Of Iterators (IOI)

This gist describes a possible variation to Interface Types. It starts with a description and overview, and also contains a full working example of the approach.

This idea is specific to how modules using Interface Types are connected. It proposes no changes to the Interface Types themselves, and does not require any changes in the trusted role of the VM in Interface Types. The only core change proposed here is an alternative to adapter functions. Specifically, we propose not adding adapter modules, adapter functions, or adapter fusion, and instead connecting modules in a different way. For terminology, call the existing approach Fusion of Adapter Functions (FOAF), and as the new approach focuses on inlining, we call it Inlining Of Iterators (IOI).

Adapter fusion is an operation that generates code at link time. The idea behind IOI is to consider what things might look like if we instead add a more general way to generate code at link time, and build on that. This appears to end up with several interesting tradeoffs both ways.

There is a wide range of ideas around inlining, from a performance hint to Turing-complete macros that receive input from outside of the module. IOI requires the ability to inline at link time, which could be accomplished in various ways, perhaps the simplest of which is a performance hint, parallel to Branch Hinting: an optional note on a function that mentions it is worth inlining at link time. Interpreters and baseline compilers might simply ignore the hint, while optimizing compilers would be expected to inline and optimize during link.

In general, code generation in wasm looks like this:

  1. Wasm VMs typically compile at compile time (and then the link stage is very fast).
  2. Wasm VMs could also do some compilation at link time. At that time we know what we have linked to, which offers more optimization opportunities. This is what fusion does in FOAF, and the form of inlining that IOI depends on for speed.
  3. Wasm VMs could in principle also do some compilation at run time, like the JVM and CLR do: seeing which functions are hot, and then doing further optimizations there such as inlining.

Both FOAF and IOI only move wasm from 1 to 2 in that space. Also, both FOAF and IOI clearly state what is to be inlined at link time. As a result, in either approach wasm VMs can still be a lot simpler than the runtime compilation and dynamic decisions that 3 entails. (Of course, there may be other good reasons for VMs to implement 3; in that case, IOI could benefit from that and might not even need a hint as to which functions to inline, although the hint might still improve predictability.)

IOI can be done in more than one way. The simplest to explain may be to start with the definition of adapter functions and work from there. For example, list.lift is defined as this:

list.lift $List $done $liftElem $destructor? : [ T:<valtype>* ] -> [ $List ]
where:
  - $List refers to a (list E:<intertype>)
  - $done : [ T* ] -> [ done:i32, U* ]
  - $liftElem : [ U* ] -> [ E, T* ]
  - $destructor : [ T* ] -> [], if present

That is, it receives a list, an operation to figure out if the iteration on the list's data is done, and an operation to get the next item. (There is also a destructor, which we implement in a different way, see later.) Instead of using the adapter fusion algorithm to combine the lift operations from one side and the lowering operations on the other, $done etc. could be actual function references, called from actual wasm code. Then that wasm code can simply be run, and in each iteration, it would get the next item in the list and provide it to the other side. This is functionally identical to what happens in FOAF. The question, of course, is how fast this can be. That is a difficult question to answer without data, but if the $done etc. methods are all inlined, then it is plausible that this could be quite fast. In particular, it definitely avoids the extra copy problem that FOAF also avoids (but that serialization does not). But even with that, FOAF provides some guarantees on performance that IOI cannot. We will see later, however, that there are many tradeoffs in this area, and speed benefits on both sides.

Leaving speed for later, let us write up some more concrete code for the first variation of IOI, called bridge-driven IOI because there is a bridge between the two modules which drives the conversation (the call, including the passing of data) between them. Here is python-like pseudocode for two modules, and a bridge between them, for a call that sends an f64 and returns a string:

# Target module

module target:
  def startConversation():
    # As we iterate over the output string, we will use this index.
    target.currConversation.index = 0

    # Return function references to our operations.
    return (target.done, target.liftElem)

  # The "application code" / "business logic" in this module. It
  # knows nothing about interfacing with other modules - it just
  # understands its own types.
  def func(f64):
    # A silly operation that returns some string for a given number
    return str(f64 + 3.14159)

  # Wrap around 'func' for interfacing purposes.
  def func_wrapper(f64):
    # Call the actual function, and prepare the result for being iterated on.
    target.currConversation.str = func(f64)
    target.currConversation.maxIndex = len(target.currConversation.str)

  def done():
    return target.currConversation.index == target.currConversation.maxIndex

  def liftElem():
    return target.currConversation.str[target.currConversation.index++]

  def endConversation():
    target.currConversation.str = None

# Bridge module for the Interface Type "f64 => string". Note how
# the f64 can just be passed as a core wasm type, while the string
# is iterated on.

module bridge:
  def bridge(target_startConversation, target_func, target_endConversation, f64, source_lowerElem):
    # Start the conversation on the target side, getting its operations.
    target_done, target_liftElem = target_startConversation()

    # Call the target function.
    target_func(f64)

    # Iterate on the output string. Note how this is functionally identical to
    # FOAF, and how it only understands about Interface Types and not the
    # internal details of either side.
    while not target_done():
      source_lowerElem(target_liftElem())

    # End the conversation on the target.
    target_endConversation()

# Source module

module source:
  def func():
    # Call the target function through the bridge. We provide the f64
    # parameter and the operation for us to receive the output string's
    # elements, which is basically `list.lower`'s `$lowerElem`.
    source.startConversation()
    bridge.bridge(target.startConversation, target.func_wrapper, target.endConversation,
                  2.1828,
                  source.lowerElem)
    print(source.currConversation.str)
    source.endConversation()

  def startConversation():
    source.currConversation.str = ''

  def lowerElem(elem):
    source.currConversation.str.append(elem)

  def endConversation():
    # Free the string.
    source.currConversation.str = None

To understand how this works, it may be helpful to view these slides that show the control flow around the bridge in graph form. Another thing that can help is to see what the code looks like after inlining and constant propagation:

def func():
  source.currConversation.str = ''
  target.currConversation.index = 0
  target.currConversation.str = str(2.1828 + 3.14159)
  target.currConversation.maxIndex = len(target.currConversation.str)
  while target.currConversation.index != target.currConversation.maxIndex:
    source.currConversation.str.append(target.currConversation.str[target.currConversation.index++])
  target.currConversation.str = None
  print(source.currConversation.str)
  source.currConversation.str = None

The string is transferred by a simple loop that iterates on the data, avoiding an extra copy in the middle. In fact, if the two languages here use, say, linear memory and GC, then (in addition to the necessary incrementing) the loop could contain basically a single i32.load followed by a single array.set, the actual low-level operations for each language. (In the python-esque pseudocode, the append may look like a heavy operation, but it would just be a write and an increment; and the ABI should probably send the string size in advance, as mentioned in the Interface Types Explainer.)

It may not be necessary, but if the VM also performs escape analysis (if the code uses wasm GC) or LICM (if the code uses linear memory) then the loop can look something like this:

  while target_index != target_maxIndex:
    source_str.append(target_str[target_index++])

That should in many cases run at optimal speed.

Note how we begin with a bridge that only understands the Interface Types involved, and two modules that each only understand their own internals, but after inlining and reasonable optimizations (as mentioned: constant propagation, and perhaps escape analysis or licm) we end up with fast code. While we would not have a guarantee of such a result, if it is achieved in the common case then it may be worth it for the benefits that IOI brings, which we will see later.

(Note that the idea of depending on inlining to make iterators and getters run at full speed is of course not a new one, as it is - heavily! - relied upon in languages like C++ and Rust, for example.)

Before getting into more details, it is worth mentioning some possible extensions and variations to IOI:

  • The example above was called "bridge-driven" because the caller calls into the "bridge", which then guides everything (in particular, the loop to iterate on the string is in there, as well as the rest of the conversation with the other module). A benefit to that model is that, if we implement the bridge in the VM, then it can be trusted by the calling modules to behave as expected. Another variation on IOI that is worth considering is caller-driven, in which bridge.bridge would be implemented inside the calling module. The interesting tradeoff there is that the toolchain can potentially perform more optimization, but it also means that the called module cannot depend on the VM as we did before, at least not without some additional work; more on both later.
  • IOI can be generalized to a capability-based security approach, in which a call between two modules passes tokens to allow permission to access data. In the above example pseudocode we depended on the fact that there was one conversation active at a time; more generally, there would need to be a stack of conversation states. But we can generalize that further to allow for long-running conversations, or ones that overlap in various ways (that may also be possible in FOAF, if an adapter function contains a stack switching instruction to suspend), or one module receiving a capability token that it passes through to a third module, etc. etc. That is, to return a string, a module would provide a token that is used in calls to iterate on the string's contents. This basically extends capability-based security, already a foundation of WASI, from the system interface level to calls between individual modules.

The rest of this gist contains more details about performance, simplicity, and verification, and then a full working example of the approach (specifically, in the caller-driven variation, which is a little shorter to write up), runnable in a VM today (without link-time inlining, it will be slow today, of course).

Performance

Call "glue code" the code whose only purpose is to convert types between the modules. Usually this is code emitted by the toolchain, and distinct from the "application code" that the user writes. The application code doesn't know anything about interfacing with other modules, which is the glue code's purpose. The next subsections consider inlining of different types of code in different locations.

Inlining of application code between modules

Consider the case of one module calling another, and the called function is a tiny getter, that is, where the call overhead is far larger than the work done in the function. Something like this:

module foo:
  def simple_operation(x):
    return x + 1

module bar:
  def do_many_calls:
    sum = 0
    for i in range(1000000):
      sum += foo.simple_operation(i)
    return sum

IOI can inline application code across module boundaries. (In fact, in the example earlier we inlined target.func.) While in general it may not be easy to know which imports are worth inlining, at least in some cases that may be possible, such as simple_operation in this section. As a result, IOI can be fast on that example.

In the proposed component model in wasm we have shared-nothing linking, and calls between components therefore have two forms of overhead: copying overhead, where we need to copy data between them (and therefore it is important to focus on removing any extra copies, at least) and call overhead (the actual call itself, and missed opportunities in register allocation, constant propagation, etc.). The two forms will apply in different use cases, for the most part, as if the data is large the call overhead may not matter, and at other times, the call overhead may dominate.

If we want to make the component model as useful as possible then we should aim to reduce both forms of overhead as much as we can. Comparing to other ecosystems, copy overhead is not a problem in the JS, C++, Rust, Java, C#, etc. ecosystems, as memory is shared between libraries. Likewise, call overhead is mitigated in all of those, either because of dynamic inlining, or inlining from headers, or LTO, etc. The wasm component model does not have those benefits, and so both call and copy overhead may become an issue.

Using link-time inlining in IOI we can reduce both forms of overhead. Or, rather, IOI sees the problem of copying overhead as one that can be reduced to calls of getters/iterators across the module boundary, which is the problem of call overhead, and therefore if we solve one we solve both. (Of course, the approach of using link-time inlining is not guaranteed to be perfect, and we need to validate it on real-world data.) On the other hand, if all we have is FOAF, and it does not inline such application code, then it will only handle copying overhead but not call overhead, and will therefore be slower than IOI on some use cases. (Again, this would need to be investigated in real-world data.)

Let us consider ways in which FOAF could be competitive with IOI on the call overhead. We could imagine that we spec and ship both FOAF and a general feature to inline at link time, and use FOAF for glue code and the general feature for application code, but then we have specced and shipped two features to emit specialized code at link time.

Alternatively, as adapter functions can contain arbitrary wasm, they could contain application code as well as glue code. If we placed both foo.simple_operation and bar.do_many_calls in adapter functions (note that we need both sides) then they would get fused and the call overhead between them removed. We would need to handle things like getting that code access to the relevant global state they need (as adapter functions appear in an adapter module, which might be separate from the main code, that might not be trivial), but this seems possible. It may, however, run a little counter to the spirit of the Interface Types Explainer that says that "the role of adapter code is to be a thin layer between core modules", but it may be worth it for the speed.

If adapters only contain glue code, as perhaps implied in that last quote, then it would be an option to ignore them in debuggers, as only toolchain developers would really want to see them. (While we would complain, there are not that many of us...) But if we put application code in adapter functions, as suggested in the last paragraph, then issues arise with the layering of adapter functions on top of core wasm. While the core VM can only care about core wasm, which mitigates much of the security and other risks of adding adapter functions, that will require interaction with places that do care about the original form of the wasm binary, such as debugging. The issues may include:

  • Noting the original binary address of wasm instructions, and piping that through fusion, to be used by the debugger later. This would be necessary for DWARF and source maps.
  • Debuggers would need to support the viewing of adapter function code, and to step through it and so forth. This would include support for the new instructions only possible in adapter functions. This would be necessary when debugging without DWARF or source maps.
  • A debugger that looks at the call stack might need additional work, as the inlining done during fusion would change things.
  • DWARF would need to be extended to support adapter modules and adapter functions. (E.g., DWARF offsets are into the code section, so this may be trivial depending on the binary format of adapter functions.)
  • Any tool that scans the wasm binary for some reason might need additional work. This may not be the case with current wasm debuggers, but it has been done in things like sanitizer integration in Emscripten in the past (but not the present).

In principle such issues can also happen in IOI. However, in IOI we have the ability to do nothing: to not inline, when running in the debugger, and then all we have is regular wasm which already runs in debuggers. This may be the natural thing to do, for the same reason that some debuggers run wasm in the lowest tier, which avoids doing the work to make the optimizing tier debugging-friendly. That is, a VM might only run the baseline compiler when debugging, and the baseline compiler might ignore the inlining hint.

Inlining of glue and application code within a module

In the caller-driven variation of IOI, the bridge code is in the calling module, which means the toolchain has the opportunity to optimize it together with the application code (as both are simply regular code in the calling module). For example,

module caller:
  # Glue plus the bridge for a call that sends a list of integers to
  # the called function.
  def glue_plus_bridge(func, list):
    target_lowerElem = start_conversation()
    for i in range(len(list)):
      target_lowerElem(list[i])
    call(func)
    end_conversation()

  def application_code():
    glue_plus_bridge(other_module.foo, [1,2,3,4])

First, note how the loop that iterates over the list already contains list[i], that is, is specialized for the language in the caller module, which is possible as all this is emitted by the same toolchain for the same language. Second, note how we call the other module with an array of fixed size (4). The toolchain may inline and then unroll the loop, arriving at this:

module caller:
  def application_code():
    target_lowerElem = start_conversation()
    target_lowerElem(1)
    target_lowerElem(2)
    target_lowerElem(3)
    target_lowerElem(4)
    call(func)
    end_conversation()

It may be helpful to be more concrete. Here is some C++:

#include <vector>

extern "C" void lift(int x);

void lift_vec(const std::vector<int>& v) {
  for (auto x : v) {
    lift(x);
  }
}

int main() {
  lift_vec({424242, 1337, 21828});
}

That will lift a std::vector to another module. An easy way to get fairly readable code after running the LLVM optimizer is to let emcc compile it first to wasm and then to JS, using this:

emcc -O3 --profiling -s WASM=0 -s ERROR_ON_UNDEFINED_SYMBOLS=0

The result is this:

function main($0, $1) {
 $0 = $0 | 0;
 $1 = $1 | 0;
 lift(424242);
 lift(1337);
 lift(21828);
 return 0;
}

A perfect result!

To get there, LLVM did some nontrivial things, as the unoptimized code begins with stores to memory of those three interesting integers, and even if the loop is inlined into main, there are still dozens of lines in between it and those stores, lines which contain other reads and writes to memory (like setting the array length, etc.). LLVM is capable of figuring this out, but a wasm VM likely would not, both because LLVM simply does a lot more work, but also because LLVM IR has a lot more information that helps with things like alias analysis.

(There are many more possibilities here, some of which would require ABI changes. For example, a function that receives a list of integers and only looks at the last one could actually only read that single value, instead of iterating to the end, if the ABI were random-access instead of iteration.)

While these are not trivial optimizations, this would be done at the toolchain level, where LLVM and others can in fact do such things. In addition to this, after the VM inlines the other side, the early toolchain optimizations may enable further benefits, if the toolchain made the resulting wasm easier for the VM's simpler optimizer to handle.

In comparison, in FOAF the adapters are special code with a special form and role (so that the VM can fuse them together), which makes toolchain inlining of glue code with application code in the same module more difficult. While adapters are not quite a different language, they are distinct from normal wasm due to the restrictions on them, like having affine types, loop-parameter restrictions, etc. Perhaps most importantly, adapter functions need to be emitted in a different way (as adapter functions in an adapter module), and so it is not clear how we could just run a general-purpose optimizer on both adapter functions and other code together.

A general principle in optimization is that doing everything in a single uniform language is simpler, and it is best to avoid special modes (like strict mode in JS) and that while combining multiple languages can be very productive, it does inhibit optimizations across the language boundary (e.g., asm.js and JS, wasm and JS, JS and the browser's native code, C++ and a script language in a game engine, etc.). This metaphor is not precise here, but the principle of having a single format in which to optimize does play in IOI's favor.

Predictability and generality

As mentioned earlier, FOAF gives a guarantee of good performance on fused adapter functions, as it inlines in a predictable way and avoids adding additional overhead there. This is an advantage compared to IOI, in which we depend more on compiler optimizations. While IOI will not do any extra copying of data, it may have other additional overhead, and in the worst case a significant amount.

(How hard IOI is to optimize depends on the details. For example, the pseudocode in the introduction used function references extensively, which is necessary in bridge-driven IOI given the bridge must be independent, but in caller-driven IOI we can inline the caller's own operations at the toolchain level, and also call the other module's imports directly instead of using references, making the code a lot simpler. But, bridge-driven IOI may be equally fast as turning a call_ref of a constant function into a direct call is a simple optimization.)

The flip side of FOAF's advantage in predictability is that it is focused on a very specific task: copy overhead between modules. It does not help with call overhead, as we saw earlier, and it does not help with anything else. In comparison, IOI does help with call overhead, and in addition, the work to make IOI fast will make other things faster as well as when a VM invests in improving things like constant propagation and escape analysis then that work can be useful on a broad range of code.

It is impossible to evaluate this tradeoff without real-world data, as it depends on what the future ecosystem of modules will look like, and in particular, how large the different types of overhead will be.

Startup time / baseline compilation

The above discussion focuses on peak performance from code generated in the optimizing tier of the VM. Baseline compilation (or, similarly, an interpreter) has other tradeoffs, which impacts startup times.

We have already mentioned that in IOI a baseline compiler can simply ignore the inlining hint. That would allow optimal startup speed (no new work at the link stage) in a very simple way (no changes would be needed in the baseline compiler). The downside is that cases that heavily benefit from inlining may be slower - potentially significantly slower than the usual 2x slowdown common in baseline compilers today. That is, we have the tradeoff of optimal startup for slower speed while we wait for the optimizing tier.

(Note that while the baseline compiler does not change, the bridge is still needed in the case of bridge-driven IOI. If the bridge is a regular wasm module, then it can be baseline-compiled at compile time normally, and so there is no work at link time. If, on the other hand, the VM provides the bridge, then it needs to do the work to generate it for the necessary Interface Types. But in this case as well, the bridge can be generated at compile time, as the bridge only depends on the used Interface Types signatures, and not the internal details of the modules on both sides.)

For comparison, in FOAF the VM must perform fusion at link time, as fusion can only be done when both sides are present. Note that FOAF's link-time overhead may not be linear. If we have a set of modules using L different languages, and T different Interface Type signatures for calls between them, then in FOAF we have T L^2 total adapters to fuse. For example, for two languages Rust and Dart, and two signatures "send a string" and "send an int", we must fuse to get all the 8 combinations of "send a {string, int} from {Rust, Dart} to {Rust, Dart}".

(Note that with additional work FOAF could perhaps do something similar to IOI, that is, baseline compile each side's unfused adapter functions as well as a bridge, all at compile time, which would make FOAF behave very similarly to IOI. Operating on unfused code appears to be the only way to avoid non-linear work at link time.)

Simplicity

IOI requires less work in the VM than FOAF. The only strictly necessary new VM feature is link-time inlining; however, we may also want to add more to VMs eventually, such as the bridge, validation, etc. (Note that the bridge in bridge-driven IOI can be done in normal wasm code to begin with, with no performance downside or loss of functionality, while we consider how to spec it for the VM.) We may end up wanting to keep the role of the VM exactly the same as it is in FOAF - ensuring important properties - but we can do so with less work in the VM. That is because FOAF adds adapter modules, adapter functions, and fusion, which includes new entities, new instructions, and the fusion algorithm. In addition to the work to implement those things, in FOAF VMs will also need to decide how to handle interactions between adapter code and things like baseline compilation and the debugger, as mentioned earlier.

Doing less in VMs is generally preferable, all other things being equal, for security and other reasons, and is consistent with how wasm proposals have been done so far (i.e., when possible to implement something at the toolchain level we have usually opted to do so). Of course, we would need to actually see if "all other things being equal" holds here.

The simplicity of IOI can help Interface Types ship earlier and also be experimented with earlier. Once a wasm VM ships link-time inlining, IOI can be used on it with full speed. In FOAF, adapter fusion must be added to the VM for things to work at all. Polyfilling of FOAF is possible, but the result of the polyfilling is the fused code for two particular modules - those modules cannot be linked with other ones, which means we cannot gain the full benefits of an ecosystem of modules. With IOI once we have inlining we can link arbitrary modules (with the right Interface Types), and that may help avoid a situation where the component model is only truly experimented with on the server while the Web does not allow maximum reuse of components. IOI therefore enables an ecosystem of modules to appear earlier, which has the benefit of letting us learn from data and experience as we decide what to spec.

A final note: Simplicity does not only matter for VMs, but also for other tools in the ecosystem that parse and/or modify wasm, including static analyzers, instrumentation tools, optimizers, etc., such as wabt, wizer, binaryen, manticore, twiggy, wassail, wasabi, bloaty, etc., all of whom would need to handle FOAF's new additions.

Verification

The declaration of Interface Types in the wasm binary can be the same in FOAF and IOI, in a new mandatory section. (Variations of IOI also allow polyfilling that for now in a custom section.)

Bridge-driven IOI can have stronger guarantees than caller-driven IOI. If the bridge is part of the VM, then the called module is only called by the VM, and it can trust it to not do anything confusing like interleave conversations. That is, there would be a stack of conversations, just like there is a call stack, and so the called module could store its conversation state in a simple stack as well.

Caller-driven IOI can achieve similar guarantees with additional work. We could still require calling the other module through the VM, even if the VM does not drive the conversation - this would be a bridge, but a bridge that does not drive things. The VM would see that the calls to the called module make sense, in particular, that the conversations form a stack, and so forth. That would require dynamic checks in the bridge code. It is natural to expect that those checks would vanish after inlining and optimizations, as things would look like this:

# Code inlined from begin_conversation
currConversation = nextConversation++

# Code inlined from where the other side receives the conversation
# token (which should be an opaque value as seen from there).
other.token = currConversation

# The actual conversation.
while ..
  # Code inlined from the lift and lower operations for a
  # single item in the list.
  # Before doing one lift/lower operation, check the token
  # we receive as belonging to this conversation.
  assert(other.token == currConversation)
  ...

Simple optimizations on that code will turn the assert into assert(currConversation == currConversation) which can then be eliminated.

Similar assertions can be added to verify that the right order of calls occurs for the iteration functions, etc., thereby ensuring that the calls comply with the declared Interface Types.

Optionally, more static type checking can be added by adding types to wasm and using them as the tokens passed between modules. For example, we could add a stringref type, which is an opaque reference (that each side would cast internally to their own type). Note that this is not related to adding an actual string type to wasm, which would include instructions to operate on it - here we just have a type for a reference to one, which is basically just a slightly more specific externref. The benefit of just having such a type is that normal wasm type checking would match that there is a stringref on both sides (another benefit could be when running in the debugger, as the type gives a clear hint as to how to represent it to the developer).

Sketch of what the human-readable code might look like

In a GC language there might not need to be any special syntax, and one could just call:

String str = foo(f);

The toolchain would then emit the glue code if foo is a function that is imported from another module, and expected to use interfacing (there might need to be some new syntax just for that). That is, the toolchain would emit the calls to start and end the conversation automatically, right around the call.

For C and related languages, there is no automatic process that can determine when the string should be freed (the next line? the end of the block? the end of the function?), and so some amount of special syntax may be needed. But it could be very minor. One possible sketch:

start_conversation();

char* str = foo(f);

// ..do stuff with the string..

end_conversation();

// using the string here would be UB!

Other options in this space could use RAII in C++, ownership in Rust, etc., to determine when the conversation ends using the lifetime of the returned values.

The Demo Code

The specific example task shown in the runnable code in this gist is to interface using this function signature:

    (f64)=>string

That is, a function is called that receives an f64 number and returns a string.

  • The string must be available to the caller to use, but also freed at an appropriate time.
  • The "glue" code in each module must not need to be rewritten when we change the other module. That is, each of the modules should be swappable with a different module, using a possibly very different meaning of "string". Or, in other words, a module in language A is written once, and can then call or be called from modules written in languages B or C with no changes.

Demo Implementation Overview

Four wasm modules are defined:

Name Role Language
calling-linear Calls a module Linear memory
calling-gc Calls a module GC
returning-linear Is called, returns a value Linear memory
returning-gc Is called, returns a value GC

Each of the "calling" modules can call either of the "returning" modules, and vice versa. This shows that each module does not need to know about the internal representation of the data (linear memory or GC) in the module it is linked to at runtime.

Note: A bridge between the modules is not shown in this demo.

Running the Demo

$ ./build.sh

and then

$ d8 --experimental-wasm-gc test.js
testing calling-linear.wasm => returning-linear.wasm
L little
testing calling-linear.wasm => returning-gc.wasm
L few
testing calling-gc.wasm => returning-linear.wasm
G little
testing calling-gc.wasm => returning-gc.wasm
G few

"L" is used to show that linear memory is the calling module, and "G" for GC. The linear returning module returns "little", and the GC one "few", so the output shows that we successfully call between the modules in all 4 modes. That is, each module is tested against two others, showing that it is 100% independent of the implementation in the module it is linked to.

You can edit the files to experiment with things. For example the f64 that is passed is currently 1.0, and the called functions check if the input is less than 2.0 (and they return a different string based on that), so by increasing that number you will see different results, etc.

Reading the Code

The most heavily annotated file is calling-linear.wast, so it might be a good starting point, followed by returning-linear.wast.

The JS to link modules together and call them is in test.js. You can read that to verify that each module is linked against two others, each with a very different internal implementation of "string", but we can still call across them.

Testing

Running ./test.sh will build, run, and compare the output to the correct one.

Incidental details in the example

As you read the code you may notice some details that are incidental to the example here, that is, that are not part of the IOI approach. IOI can be implemented in various ways, and choices had to be made to write a concrete example. This section clarifies some of those things.

  • Wasm GC is used for the conversation tokens. That support is used in the glue code, but of course not in the actual linear memory code itself - that is, a linear memory string is still in linear memory, as it has to be. Alternatively, we could use i32 as the token type everywhere, and then we'd need a table to match those IDs to the conversation info, etc. There are tradeoffs either way (do we want to depend on wasm GC, in return for the convenience and safety?), and IOI could support either. As it is shorter to write an example using wasm GC in the glue, this example does so.
    • Note that using a GC object as the conversation token is efficient, as it avoids the need for a mapping. In a sense the token is really a closure state, which is opaque outside of the current module.
  • The particular ABI of a string here is as a sequence of i8s terminated by a 0. Of course that is not what we want more generally for unicode - I am not a unicode expert, so I didn't even try to make something "realistic". Perhaps it is simpler to think of this example as implementing the type "array of i8". Note also that this example could have provided a length with the string/array as opposed to null-terminating it. To keep the example code brief, the shortest approach was written.
#!/bin/bash
set -e
echo "calling-linear"
~/Dev/binaryen/bin/wasm-opt calling-linear.wast -all -o calling-linear.wasm
echo "returning-linear"
~/Dev/binaryen/bin/wasm-opt returning-linear.wast -all -o returning-linear.wasm
echo "calling-gc"
~/Dev/binaryen/bin/wasm-opt calling-gc.wast -all -o calling-gc.wasm
echo "returning-gc"
~/Dev/binaryen/bin/wasm-opt returning-gc.wast -all -o returning-gc.wasm
;; (Comments that would be the same in this code and in calling-linear are
;; omitted - see them there.)
(module
(type $func (func (param (ref null data))))
;; A string of characters. For simplicity in this example, null-termination is
;; used in the array (and characters after that are ignored).
(type $string (array (mut i8)))
(import "js" "print-char" (func $print-char (param i32)))
(import "returning" "start-conversation"
(func $returning-start-conversation (result (ref null data))))
(import "returning" "push-f64"
(func $returning-push-f64 (param (ref null data)) (param f64)))
(import "returning" "pop-char"
(func $returning-pop-char (param (ref null data)) (result i32)))
(import "returning" "end-conversation"
(func $returning-end-conversation (param (ref null data))))
(import "returning" "func" (func $returning-func (param (ref null data))))
(func "main"
;; Note that we do not need a token at all for our half of the
;; conversation. We just need one for the other side. That is possible
;; because, unlike calling-linear, we don't have any special cleanup
;; to do at the end (thanks to GC).
(local $returning-token (ref null data))
;; Comparing to calling-linear, note how here we use a GC string while
;; there we used one in linear memory.
(local $string (ref null $string))
;; Print "G " to identify ourselves as using GC.
(call $print-char (i32.const 71))
(call $print-char (i32.const 32))
(local.set $returning-token
(call $returning-start-conversation)
)
(local.set $string
(call $bridge-conversation
(ref.func $returning-func)
(f64.const 1.0)
(local.get $returning-token)
)
)
(call $print-string
(local.get $string)
)
(call $returning-end-conversation
(local.get $returning-token)
)
)
(func $bridge-conversation
(param $func (ref $func))
(param $param f64)
(param $returning-token (ref null data))
(result (ref null $string))
(local $char i32)
;; The string we build up from received characters.
(local $string (ref null $string))
;; The next position to write a character to.
(local $string-index i32)
(call $returning-push-f64
(local.get $returning-token)
(local.get $param))
(call_ref
(local.get $returning-token)
(local.get $func)
)
;; Allocate a string to write to.
(local.set $string
(array.new_default_with_rtt $string
;; To avoid handling string appending, just allocate a huge string for
;; example purposes.
(i32.const 1000)
(rtt.canon $string)
)
)
(loop $loop
(local.set $char
(call $returning-pop-char
(local.get $returning-token)
)
)
(array.set $string
(local.get $string)
(local.get $string-index)
(local.get $char)
)
(if
(local.get $char)
(block
(local.set $string-index
(i32.add
(local.get $string-index)
(i32.const 1)
)
)
(br $loop)
)
)
)
(local.get $string)
)
(func $print-string (param $string (ref null $string))
(local $char i32)
(local $index i32)
(loop $loop
(local.set $char
(array.get $string
(local.get $string)
(local.get $index)
)
)
(call $print-char (local.get $char))
(if
(local.get $char)
(block
(local.set $index
(i32.add
(local.get $index)
(i32.const 1)))
(br $loop)
)
)
)
)
)
(module
;; The type of the function we will call. This takes a conversation token,
;; which is a (ref null data). Note that it does not include the actual
;; parameters or return values, because their types may not agree on both
;; sides. That is, we can't write "i32" for the type of a string here -
;; that would work for C but not a GC language. (Note, however, that if
;; we added relevant types to wasm then we could use them here.)
;;
;; Note that we could use (ref data) here, that is, we do not need to
;; actually allow null references. However, in the current state of wasm
;; GC there are no non-nullable locals, so we'd need to clutter the code
;; with ref.as_non_null instructions.
(type $func (func (param (ref null data))))
;; The information necessary during a conversation. Specifically, we need
;; to store the string that we allocate and write to on our side, so that
;; we can free it later.
(type $conversation (struct
;; The allocated string that contains the characters returned to us from
;; the call to the other module.
(field $string (mut i32))
))
(memory 1 1)
;; A JS import so that we can do some printing to show that things work.
(import "js" "print-char" (func $print-char (param i32)))
;; Start a conversation with the "returning" module. Returns a token that
;; identifies the (half of the) conversation on that side.
(import "returning" "start-conversation"
(func $returning-start-conversation (result (ref null data))))
;; Push an f64 parameter in a conversation. The function takes a conversation
;; token (as more than one conversation may be occurring), and then the parameter.
;; In this example, f64 is assumed to be a type that is universally agreed
;; upon, and so we just use that type both here and on the other side.
(import "returning" "push-f64"
(func $returning-push-f64 (param (ref null data)) (param f64)))
;; Get the next character from the output string. 0 indicates the end. (Another
;; approach could send the string length.)
(import "returning" "pop-char"
(func $returning-pop-char (param (ref null data)) (result i32)))
;; End the conversation on the other module. This does any necessary cleanup
;; there (which only that module understands and knows about).
(import "returning" "end-conversation"
(func $returning-end-conversation (param (ref null data))))
;; The function in the returning module that we will call. This has the
;; type $func which was declared earlier, see notes there about its signature.
(import "returning" "func" (func $returning-func (param (ref null data))))
;; Internal state for malloc. (Ignore this.)
(global $malloc-state (mut i32) (i32.const 0))
;; A function that will do a call to another module, using the bridging
;; approach.
(func "main"
;; We will store the conversation tokens in locals, and the string
;; result as well.
;; Note that we use the specific type of our conversation token,
;; $conversation, because we understand it, while we use the
;; generic (ref null data) for the other side's conversation token,
;; since we know nothing about their internal implementation details.
(local $token (ref null $conversation))
(local $returning-token (ref null data))
(local $string i32)
;; Print "L " to identify ourselves as using linear memory.
(call $print-char (i32.const 76))
(call $print-char (i32.const 32))
;; Start the two sides of the conversation, getting tokens for each.
(local.set $token
(call $start-conversation)
)
(local.set $returning-token
(call $returning-start-conversation)
)
;; Do the actual call. Instead of calling $returning-func directly,
;; we call the "glue" method $bridge-conversation. We call that method
;; in an almost normal way, passing it an f64 and receiving an i32 that
;; represents a string, that is, we use the types that make sense in
;; the language this module was compiled from (linear memory); we leave
;; all the interfacing to $bridge-conversation. The only difference in
;; our call to that method compared to a normal direct call is that we
;; also send it the conversation tokens, and a reference to the function
;; we wish to call.
(local.set $string
(call $bridge-conversation
(ref.func $returning-func)
(f64.const 1.0)
(local.get $token)
(local.get $returning-token)
)
)
;; Use the string result. It's just a normal linear memory string.
(call $print-string
(local.get $string)
)
;; End the conversation after we are done with it.
(call $returning-end-conversation
(local.get $returning-token)
)
(call $end-conversation
(local.get $token)
)
)
(func $start-conversation (result (ref null $conversation))
;; Allocate a new conversation info structure.
(struct.new_default_with_rtt $conversation
(rtt.canon $conversation)
)
)
(func $end-conversation (param $token (ref null $conversation))
;; Free the allocated string. Note that this works even if we did not
;; end up allocating a string, as free(0) is a no-op (both in C and in the
;; simple malloc/free here), so this actually already supports conditional
;; allocation in that sense. More generally, we could store a little state
;; on the $conversation data structure "things to free" that would be
;; added to only if we allocated, etc. (However, in this simple example,
;; we always allocate.)
(call $free
(struct.get $conversation $string
(local.get $token)
)
)
)
;; The bridging logic.
;;
;; Note that this method could also be in a third module "in between"
;; this module and the returning module. There are tradeoffs both
;; ways.
;;
;; Note also that this method could be factored into smaller pieces, like
;; a separate function to convert a string, etc.
(func $bridge-conversation
;; The function to be called. We could instead bake in the call target in
;; this function, but using a reference shows how a single bridging method
;; could be used for multiple target functions with the same signature.
(param $func (ref $func))
;; The f64 parameter to the function being called.
(param $param f64)
;; The conversation token in this module.
(param $token (ref null $conversation))
;; The conversation token in the returning module.
(param $returning-token (ref null data))
;; The resulting string (which is usable while the conversation is ongoing).
(result i32)
(local $char i32)
(local $string-ptr i32)
(local $string-pos i32)
;; Push the parameter.
(call $returning-push-f64
(local.get $returning-token)
(local.get $param)
)
;; Call the target function. By giving it the proper conversation token,
;; it can read the parameter we sent, and set the result where we can
;; get to it later.
(call_ref
(local.get $returning-token)
(local.get $func)
)
;; Allocate room for our string. For simplicity we allocate a big-enough
;; region here in a simple way (actual production code would be more
;; clever obviously).
(struct.set $conversation $string
(local.get $token)
;; string-ptr is the pointer to the string. string-pos is the
;; current position in the string as we write to it, which will
;; be incremented as we go in the loop below.
(local.tee $string-ptr
(local.tee $string-pos
(call $malloc
(i32.const 1000)
)
)
)
)
;; Pop resulting characters and append them to our string.
(loop $loop
;; Get a character.
(local.set $char
;; See notes in README.markdown about performance.
(call $returning-pop-char
(local.get $returning-token)
)
)
;; Write it
(i32.store8
(local.get $string-pos)
(local.get $char)
)
;; If it is not the null-terminator, then increment and loop.
(if
(local.get $char)
(block
(local.set $string-pos
(i32.add
(local.get $string-pos)
(i32.const 1)
)
)
(br $loop)
)
)
)
;; Return the linear-memory string.
(local.get $string-ptr)
)
;; Print a linear memory string, character by character. (You can ignore
;; this.)
(func $print-string (param $x i32)
(local $char i32)
(loop $loop
;; Load the next character.
(local.set $char
(i32.load8_u
(local.get $x)
)
)
;; Print it.
(call $print-char (local.get $char))
;; If it is not the null-terminator, then increment and loop.
(if
(local.get $char)
(block
(local.set $x
(i32.add
(local.get $x)
(i32.const 1)
)
)
(br $loop)
)
)
)
)
(func $malloc (param $size i32) (result i32)
;; Do a trivial bump allocation in this example code. In production we would
;; have a real malloc here. (You can ignore this, and $free.)
(global.set $malloc-state
(i32.add
(global.get $malloc-state)
(local.get $size)
)
)
(i32.sub
(global.get $malloc-state)
(local.get $size)
)
)
(func $free (param $ptr i32)
;; Do nothing. In production we would have a real malloc here.
)
)
testing calling-linear.wasm => returning-linear.wasm
L little
testing calling-linear.wasm => returning-gc.wasm
L few
testing calling-gc.wasm => returning-linear.wasm
G little
testing calling-gc.wasm => returning-gc.wasm
G few
;; (Comments that would be the same in this code and in calling-linear or
;; calling-gc are omitted - see them there.)
(module
(type $string (array (mut i8)))
(type $conversation (struct
;; The f64 parameter.
(field $param (mut f64))
;; The string output.
(field $string (mut (ref null $string)))
;; The current position in the string output, as it is read character-by-
;; character.
(field $string-index (mut i32))
))
(func "start-conversation" (result (ref null data))
(struct.new_default_with_rtt $conversation
(rtt.canon $conversation)
)
)
(func "end-conversation" (param $token (ref null data))
)
(func "push-f64" (param $token (ref null data)) (param $param f64)
(struct.set $conversation $param
(ref.cast
(local.get $token)
(rtt.canon $conversation)
)
(local.get $param)
)
)
(func "pop-char" (param $token (ref null data)) (result i32)
(local $conversation (ref null $conversation))
(local $char i32)
(local.set $conversation
(ref.cast
(local.get $token)
(rtt.canon $conversation)
)
)
;; Read the next char.
(local.set $char
(array.get_u $string
(struct.get $conversation $string
(local.get $conversation)
)
(struct.get $conversation $string-index
(local.get $conversation)
)
)
)
;; Increment the string index.
(struct.set $conversation $string-index
(local.get $conversation)
(i32.add
(struct.get $conversation $string-index
(local.get $conversation)
)
(i32.const 1)
)
)
(local.get $char)
)
(func "func" (param $token (ref null data))
(local $conversation (ref null $conversation))
(local $string (ref null $string))
(local.set $conversation
(ref.cast
(local.get $token)
(rtt.canon $conversation)
)
)
;; Pick which string to return based on the input.
(if
;; Check if the input is less than 2.
(f64.le
(struct.get $conversation $param
(local.get $conversation)
)
(f64.const 2)
)
(block
;; Write out "few\0".
(local.set $string
(array.new_default_with_rtt $string
(i32.const 4)
(rtt.canon $string)
)
)
(array.set $string
(local.get $string)
(i32.const 0)
(i32.const 102)
)
(array.set $string
(local.get $string)
(i32.const 1)
(i32.const 101)
)
(array.set $string
(local.get $string)
(i32.const 2)
(i32.const 119)
)
(array.set $string
(local.get $string)
(i32.const 3)
(i32.const 0)
)
)
(block
;; Write out "many\0".
(local.set $string
(array.new_default_with_rtt $string
(i32.const 5)
(rtt.canon $string)
)
)
(array.set $string
(local.get $string)
(i32.const 0)
(i32.const 109)
)
(array.set $string
(local.get $string)
(i32.const 1)
(i32.const 97)
)
(array.set $string
(local.get $string)
(i32.const 2)
(i32.const 110)
)
(array.set $string
(local.get $string)
(i32.const 3)
(i32.const 121)
)
(array.set $string
(local.get $string)
(i32.const 4)
(i32.const 0)
)
)
)
;; Set the string in the conversation.
(struct.set $conversation $string
(local.get $conversation)
(local.get $string)
)
)
)
;; (Comments that would be the same in this code and in calling-linear are
;; omitted - see them there.)
(module
;; The information necessary during a conversation.
(type $conversation (struct
;; The f64 parameter.
(field $param (mut f64))
;; The string output. This is incremented as we read it character-by-
;; character.
(field $string (mut i32))
))
(memory 1 1)
;; We will return one of these two strings. (A more elaborate demo could also
;; allocate or even conditionally allocate here; conditional allocation would
;; require us to store state on the conversation to know whether to free or
;; not, etc.)
(data (i32.const 10) "little")
(data (i32.const 20) "much")
(func "start-conversation" (result (ref null data))
(struct.new_default_with_rtt $conversation
(rtt.canon $conversation)
)
)
(func "push-f64" (param $token (ref null data)) (param $param f64)
(struct.set $conversation $param
;; Note that after inlining these casts could be removed using simple
;; optimizations (which Binaryen already does, for example).
(ref.cast
(local.get $token)
(rtt.canon $conversation)
)
(local.get $param)
)
)
(func "pop-char" (param $token (ref null data)) (result i32)
(local $conversation (ref null $conversation))
(local $string i32)
(local $char i32)
(local.set $conversation
(ref.cast
(local.get $token)
(rtt.canon $conversation)
)
)
(local.set $string
(struct.get $conversation $string
(local.get $conversation)
)
)
;; Read the next char.
(local.set $char
(i32.load8_u
(local.get $string)
)
)
;; Increment the pointer to the string.
(struct.set $conversation $string
(local.get $conversation)
(i32.add
(local.get $string)
(i32.const 1)
)
)
(local.get $char)
)
(func "end-conversation" (param $token (ref null data))
;; In this example we have nothing to do, as the strings are static.
;; However, if we allocated a string in "func" then we could note that fact
;; on the conversation info struct, and free it here, etc.
)
;; The function that is called from the calling module. It does something with
;; the f64 it is given, then returns a string.
;;
;; Note that this code contains both the "business logic" of input processing
;; and the code to read things from the conversation. The business logic - the
;; if statement - could be in a separate function, if a toolchain prefers to
;; emit things that way.
(func "func" (param $token (ref null data))
(local $conversation (ref null $conversation))
(local.set $conversation
(ref.cast
(local.get $token)
(rtt.canon $conversation)
)
)
;; Pick which string to return based on the input.
(struct.set $conversation $string
(local.get $conversation)
(if (result i32)
;; Check if the input is less than 2.
(f64.le
(struct.get $conversation $param
(local.get $conversation)
)
(f64.const 2)
)
(i32.const 10) ;; "little"
(i32.const 20) ;; "much"
)
)
)
)
//
// Test script for d8. This requires new d8 with GC support. Run it with
// something like:
//
// d8 --experimental-wasm-gc test.js
//
const JSSupport = {
_buffer: [],
"print-char": function(char) {
if (char) {
JSSupport._buffer.push(char);
} else {
var string = '';
for (let char of JSSupport._buffer) {
string += String.fromCharCode(char);
}
console.log(string);
JSSupport._buffer.length = 0;
}
}
};
// Receive the names of two modules, one that does the call and one that
// returns a value when it is called, and test them.
function testModules(callingName, returningName) {
console.log('testing', callingName, '=>', returningName);
// Create the modules.
let callingModule = new WebAssembly.Module(readbuffer(callingName));
let returningModule = new WebAssembly.Module(readbuffer(returningName));
// Create the instances. The calling module receives the returning one's
// exports.
let returningInstance = new WebAssembly.Instance(returningModule, {});
let callingInstance = new WebAssembly.Instance(callingModule, {
js: JSSupport,
returning: returningInstance.exports
});
// Run!
callingInstance.exports.main();
}
// Test all combinations of the modules, in order to show that they are all
// independent of each other. That is, the calling module does not need to be
// modified at all even if we swap the returning module from linear memory to
// GC.
for (let calling of ['linear', 'gc']) {
for (let returning of ['linear', 'gc']) {
testModules('calling-' + calling + '.wasm',
'returning-' + returning + '.wasm');
}
}
#!/bin/bash
set -e
echo "Building..."
./build.sh
echo
echo "Running..."
d8 --experimental-wasm-gc test.js &> temp
diff reference.txt temp
echo
echo "Success!"
@fgmccabe
Copy link

This clearly leaves interface types completely as a toolchain phenomenon. TBH, I can't see any role for them in your scenario.

@kripken
Copy link
Author

kripken commented Jun 24, 2021

While a previous version of this document leaned very heavily on the toolchain side, the current text no longer does, and the toolchain is just one part here - in some of the options, not even the most important one. For example, in some of the options the VM is 100% responsible for validating Interface Types APIs as well as the data flowing through them (see the "Verification" section).

So I am confused by your comment. But let's discuss this in the meeting tomorrow! We'll have higher bandwidth there.

@kripken
Copy link
Author

kripken commented Jun 28, 2021

Following the meeting, I also made some edits to the text here to make things clearer, such as this:

Interface Types can be declared in the wasm module in a section. (This could either be a new type of section, or a custom section, depending on whether we want it to be mandatory or not.) The VM can use the section to verify that a call between two modules has the proper types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment