Created
July 29, 2022 11:39
-
-
Save coolbutuseless/6b9298d438f53c315151d642c5440dcd to your computer and use it in GitHub Desktop.
Toy WASM interpreter in R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Toy WASM interpreter in base R" | |
author: "mikefc" | |
date: 2022-07-29T18:05:00+10:00 | |
categories: ['Rstats'] | |
tags: ["R"] | |
--- | |
```{r, include = FALSE} | |
knitr::opts_chunk$set( | |
collapse = TRUE, | |
comment = "#>", | |
# fig.path = "/img/visvalingam/README-", | |
out.width = "100%" | |
) | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
# Ensure that images are rendered using a device which understands patterns | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
knitr::opts_chunk$set(dev.args = list(png = list(type = "cairo"))) | |
library(eventloop) | |
library(nara) | |
``` | |
## WASM | |
One of the big buzzes at #RStudioConf2022 was the whiff of serverless "Shiny" | |
in both python and R by harnessing [WASM](https://en.wikipedia.org/wiki/WebAssembly). | |
WASM (a.k.a. WebAssembly) is a portable binary-code format initially targetted | |
at running code in a web browser at near-native speeds. | |
This post is a small exploration into what a WASM binary actually looks like | |
and how it is structured. | |
I also create a toy interpreter for WASM in R. Note: You would never do this | |
in real-life - it's just for a bit of fun (maybe because I had so much fun with the | |
virtual machine interpreter for [AnotherWorld](http://localhost:4321/2022/07/29/anotherworld-game-written-playable-in-r-with-nara-and-eventloop/)) | |
## C function to calculate `factorial(n)` | |
This is a classic bit of C code for calculating `n!` i.e. `n * (n-1) * (n-2) * ... * 1` | |
```{c eval=FALSE} | |
int factorial(int n) { | |
if (n == 0) | |
return 1; | |
else | |
return n * factorial(n-1); | |
} | |
``` | |
## `factorial(n)` in WAT (human readable WASM text) | |
There is a human-readable version of WASM called "WAT" which has a structure | |
similar to S-expressions. | |
The above C code would be compiled to WASM/WAT as follows: | |
```{wat} | |
(func (param i64) (result i64) | |
local.get 0 # put arg[0] on stack | |
i64.eqz # compare top of stack to zero | |
if (result i64) # if it is zero | |
i64.const 1 # put 1 on stack | |
else | |
local.get 0 # put arg[0] on stack | |
local.get 0 # put arg[0] on stack | |
i64.const 1 # put 1 on stack | |
i64.sub # subtract the top 2 values in stack | |
call 0 # Call function #0 (return value is on the stack) | |
i64.mul # Multiply | |
end)' | |
``` | |
## Compile WAT to WASM | |
The [WebAssembly Binary Toolkit (WABT)](https://github.com/WebAssembly/wabt) | |
can be used to compile the WAT file into a binary WASM file | |
```{sh eval=FALSE} | |
brew install wabt | |
wat2wasm helloworld.wat # Creates 'helloworld.wasm' | |
``` | |
``` | |
# helloworld.wasm | |
00 61 73 6d 01 00 00 00 01 06 01 60 01 7e 01 7e 03 02 01 00 0a 17 01 15 00 20 00 50 04 7e 42 01 05 20 00 20 00 42 01 7d 10 00 7e 0b 0b | |
``` | |
## WASM raw bytes | |
With some googling and using the reference list of [WASM bytecodes](https://webassembly.github.io/spec/core/appendix/index-instructions.html) | |
it is possible to deconstruct and understand the bytes in the `*.wasm` file for | |
the `factorial(n)` function | |
```{wasm} | |
0x00, 0x61, 0x73, 0x6D, # magick = \0asm | |
0x01, 0x00, 0x00, 0x00, # version = 1 | |
# The 'type' section contains function signatures. | |
0x01, 0x06 , # section "type". 6 bytes | |
0x01, # Num types = 1 i.e. just a single 'function' type defined here | |
0x60, # Function (begin definition) | |
0x01, # Number of parameters = 1 | |
0x7e, # param types: i64 | |
0x01, # Number of results = 1 | |
0x7e, # result types: i64 | |
# The 'function' section stores indexes of the function signature | |
0x03, 0x02, # section "function". section size = 2 bytes | |
0x01, # Number of functions | |
0x00, # Index of function | |
# The 'code' section contains the actual function code | |
0x0A, 0x17, # Section "code". section size = 0x17 = 23 bytes | |
0x01, # Number of functions = 1 | |
0x15, # Function body size = 0x15 = 21 bytes | |
0x00, # Number of local declarations = 0 | |
0x20, 0x00, # local.get 0 | |
0x50, # i64.eqz | |
0x04, 0x7E, # if i64 | |
0x42, 0x01, # i64.const 1 | |
0x05, # else | |
0x20, 0x00, # local.get 0 | |
0x20, 0x00, # local get 0 | |
0x42, 0x01, # i64.const 1 | |
0x7D, # i64.sub | |
0x10, 0x00, # call 0 | |
0x7E, # i64.mul | |
0x0B, # end | |
0x0B # end | |
``` | |
## WASM virtual stack machine | |
WASM bytecode is intended to be run on a virtual stack machine. | |
A simple way of implementing a stack machine in R is to have a giant switch statement | |
and a stack implementation. The `switch` statement chooses what | |
operations to do based upon the bytecode. | |
I'm cheating a little bit here and have hand-parsed the code split at the | |
if-else statement into two blocks. | |
#### Other cheats: | |
* you would normally need 64bit integers (i.e. `i64` type in WASM) to interpret the `factorial()` code | |
in WASM, but as long as you don't give it a large number it will be fine. | |
* integers in WASM are encoded as [LEB128 variable length integers](https://en.wikipedia.org/wiki/LEB128) | |
but since the code sizes are all small, every integer fits in 7-bits which | |
means that no encoding/decoding is necessary. | |
#### Warnings | |
* This unoptimized interpreter is going to be terribly slow. | |
#### Stack | |
Stack implementations don't come much more naive than this. | |
The `stack` variable itself is just going to exist in the global environment. | |
```{r} | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
# Naive stack | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
stack <- c() | |
# Pop a value of the stack | |
stack_pop <- function() { | |
res <- stack[1] | |
stack <<- stack[-1] | |
res | |
} | |
# Push a value on the stack | |
stack_push <- function(val) { | |
stack <<- c(val, stack) | |
} | |
``` | |
#### Define the functions | |
The major cheat here is that i've pre-parsed the function into initial code | |
in the function and the two code blocks that are executed depending on | |
the `if/else` statement. | |
There may be multiple functions in general, so `all_funcs` is a list of | |
all the functions in the code, and the correct function is extracted | |
during a `call` opcode (i.e. `0x10`) | |
```{r} | |
all_funcs <- list( | |
func <- list( | |
args = c(), | |
start = c( | |
0x20, 0x00, # local.get 0 | |
0x50, # i64.eqz | |
0x04, 0x7E # if i64 | |
), | |
if_branch = c( | |
0x42, 0x01 # i64.const 1 | |
), | |
else_branch = c( | |
0x20, 0x00, # local.get 0 | |
0x20, 0x00, # local get 0 | |
0x42, 0x01, # i64.const 1 | |
0x7D, # i64.sub | |
0x10, 0x00, # call 0 | |
0x7E # i64.mul | |
) | |
) | |
) | |
``` | |
```{r} | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
#" Recursively interpret the given function | |
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
interp_wasm <- function(func) { | |
# Set the bytecode to be the starting code of the function | |
byte_idx <- 1 | |
bytecode <- func$start | |
# Helper function to read 'n' bytes from the bytecode | |
read_bytes <- function(n) { | |
res <- bytecode[byte_idx + seq(n) - 1] | |
byte_idx <<- byte_idx + n | |
res | |
} | |
# For each opcode, determine what to do within a giant `switch` statement | |
while(byte_idx <= length(bytecode)) { | |
opcode <- read_bytes(1) | |
opcode <- sprintf("0x%02x", opcode) | |
switch( | |
opcode, | |
'0x04' = { # if | |
type = read_bytes(1) # value type that if() operates on. ignored | |
value = stack_pop() # get the value. treat as boolean. | |
if (value == 1) { | |
# Jump the bytecode into the respective TRUE/FALSE branch | |
bytecode <- func$if_branch | |
byte_idx <- 1 | |
} else { | |
bytecode <- func$else_branch | |
byte_idx <- 1 | |
} | |
}, | |
'0x10' = { # call | |
# Select the function to call, setup the arguments and call it | |
func_idx <- read_bytes(1) | |
next_func <- all_funcs[[func_idx + 1]] | |
arg <- stack_pop() | |
next_func$args <- arg | |
if (verbose) message("factorial(", arg, ")") | |
interp_wasm(next_func) | |
}, | |
'0x20' = { # local.get idx | |
# Push a local variable onto the stack. | |
# In this simple case all local vars are just the args to the function | |
idx <- read_bytes(1) | |
value <- func$args[idx + 1] | |
stack_push(value) | |
}, | |
'0x42' = { # i64 const | |
# Push a constant onto the stack | |
value <- read_bytes(1) | |
stack_push(value) | |
}, | |
'0x50' = { # i64.eqz | |
# Is value at top of stack == 0? No/Yes = 0/1 | |
value <- stack_pop() | |
result <- value == 0 | |
stack_push(result) | |
}, | |
'0x7d' = { # i64.sub | |
# Subtract the two values at the top of the stack | |
val2 <- stack_pop() | |
val1 <- stack_pop() | |
result <- val1 - val2 | |
stack_push(result) | |
}, | |
'0x7e' = { # i64.mul | |
# Multiply the 2 values at the top of the stack | |
val2 <- stack_pop() | |
val1 <- stack_pop() | |
result <- val1 * val2 | |
stack_push(result) | |
} | |
) | |
if (verbose) { | |
message(opcode, " -> stack: ", deparse(stack)) | |
} | |
} | |
return(stack[1]) | |
} | |
``` | |
## `factorial(3)` running in WASM | |
```{r} | |
# Select the factorial function | |
func <- all_funcs[[1]] | |
# Set the argument to be '3' | |
func$args <- 3 | |
# interpret the code | |
verbose <- FALSE | |
interp_wasm(func) | |
``` | |
## `factorial(3)` running in WASM (verbose) | |
```{r} | |
func$args <- 4 | |
verbose <- TRUE | |
interp_wasm(func) | |
``` | |
# What's next? | |
Nothing really. | |
An actual WASM stack machine would need signed & unsigned 64 bit integers. | |
You could use [`{bit64}`](https://cran.rstudio.com/web/packages/bit64/index.html) to support this, if you really wanted to. | |
There are ~250 standard opcodes, and another ~250 SSE opcodes. I have no | |
appetite for implementing all these - but it could be fun at some future date. | |
Please steal/adapt the code if you're interested. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment