This is a stress test of the maximum possible throughput of the HVM. It is a very simple program that just loops in a "best case" scenario: i.e., maximum parallelism, and maximum compiled speed (since it compiles to a C loop). I like it because it highlights each major generational breakthrough:
-
Optlam/Absal were my first JS toy implementations of the theory. It was exponentially faster than GHC in higher-order computations, but veeeery slow in "normal" programs.
-
Formality-Net was an early attempt at implementing inets efficiently back when I worked on Ethereum, but it was memory inefficient, and had a lower throughput than I wished.
-
HVM1 was the first implementation of our memory-efficient architecture. It was much faster than Absal, and "just" ~3x slower than Haskell/GHC in a single core. It also had a preview of a parallel evaluator and a compiler, but both were experimental and buggy.
-
HVM2 marks the first correct version of our mass parallelizer. It achieves near-ideal speedup per core. In 16 CPU cores, it is ~12x faster than in one CPU core. In ~32384 GPU cores, it is ~20000 faster than in one GPU core.
-
HVM3 marks the first long-term release of our low-level compiler. It identifies fragments that can be compiled to efficient low-level code, and uses C on these. This results in a 10x-100x speedup in some cases, surpassing Haskell single-core and approaching C and Rust.
The combination of all of these breakthroughs - a memory-efficient architecture (HVM1), a mass parallelizer (HVM2), and a low-level compiler (HVM3) will form the backbone of Bend, which I believe is heading towards becoming the fastest programming language in the world, period. And all of that while being asymptotically optimal, i.e., exponentially (in a complexity theory sense) faster than all alternatives for higher order and symbolic computations.
(performance estimated based on interactions per second - Absal had no native ints)
(performance also estimated based on interactions per second)
- Initial architecture, with efficient memory format
// Run with compiled mode, 1-thread:
// $ hvm1 compile stress.hvm1
// $ cd stress
// $ cargo build
// $ cd target/release
// $ hvm1 run "Main" -t 1 -c
(Loop 0) = 0
(Loop n) = (Loop (- n 1))
(Fun 0) = (Loop 65536)
(Fun n) = (+ (Fun (- n 1)) (Fun (- n 1)))
(Main) = (Fun 16)
- Automatic parallelism introduced
// Run compiled to C:
// $ hvm2 gen-c stress.hvm2 > stress.c
// $ gcc stress.c -O2 stress
// $ ./stress
@fun = (?((@fun__C0 @fun__C1) a) a)
@fun__C0 = a
& @loop ~ (65536 a)
@fun__C1 = ({a b} d)
&! @fun ~ (a $([+] $(c d)))
&! @fun ~ (b c)
@loop = (?((0 @loop__C0) a) a)
@loop__C0 = a
& @loop ~ a
@main = a
& @fun ~ (16 a)
- Low-level compiler and codegen (coming soon!)
// Run with C mode:
// $ hvm3 run stress.hvm3 -c -s
@loop(n) = ~n {
0: 0
p: @loop(p)
}
@fun(n) = ~n{
0: @loop(65536)
p: !&0{p0 p1}=p (+ @fun(p0) @fun(p1))
}
@main = @fun(16)
Note, again, that this is a very simple test to highlight the key milestones. We aim to launch a full benchmark suite after the release of the compiler, and I believe the results will speak for themselves, and make everyone see why interaction nets can be so efficient: they're the leanest model of computation, and, surprisingly enough, the less things a computer do, the faster it runs.
In this test, we implemented a prime number factorizer with a catch: it uses no machine ints, only algebraic datatypes, pattern-matching and recursion. Of course, this is a terrible way to factor numbers; the goal here is to test each runtime as a symbolic computing engine. It is a very simple test on which HVM already shows a ~20x speedup against GHC, which is one of the fastest functional compilers in the world. Note this speedup isn't related to parallelism nor λ-encodings. It is merely due to optimal looping (via superposition nodes). We believe larger speedups will be seen in more complex algorithms.
# COMMAND: python main.py
# ADT constructors using classes for faster pattern matching
class O:
def __init__(self, p):
self.p = p
self.tag = 0 # Using integers for faster comparison
class I:
def __init__(self, p):
self.p = p
self.tag = 1
class E:
def __init__(self):
self.tag = 2
E = E() # Single instance for empty
def u32(b):
if b.tag == 0: # O
return 2 * u32(b.p) + 0
elif b.tag == 1: # I
return 2 * u32(b.p) + 1
else: # E
return 0
def bin(s, n):
if s == 0:
return E
return O(bin(s-1, n//2)) if n % 2 == 0 else I(bin(s-1, n//2))
def eq(a, b):
if a.tag == 2 and b.tag == 2:
return True
if a.tag == 0 and b.tag == 0:
return eq(a.p, b.p)
if a.tag == 1 and b.tag == 1:
return eq(a.p, b.p)
return False
def inc(b):
if b.tag == 0:
return I(b.p)
elif b.tag == 1:
return O(inc(b.p))
else:
return E
def add(a, b):
if a.tag == 2 or b.tag == 2:
return E
if a.tag == 0 and b.tag == 0:
return O(add(a.p, b.p))
if a.tag == 0 and b.tag == 1:
return I(add(a.p, b.p))
if a.tag == 1 and b.tag == 0:
return I(add(a.p, b.p))
if a.tag == 1 and b.tag == 1:
return O(inc(add(a.p, b.p)))
def mul(a, b):
if b.tag == 2:
return E
if b.tag == 0:
return O(mul(a, b.p))
return add(a, O(mul(a, b.p)))
def cat(a, b):
if a.tag == 0:
return O(cat(a.p, b))
elif a.tag == 1:
return I(cat(a.p, b))
else:
return b
#k = 12
#h = 13
#s = 26
#p = I(O(I(I(O(O(O(O(I(I(I(O(O(O(I(O(I(I(I(I(I(I(O(I(O(I(E))))))))))))))))))))))))))
k = 13
h = 14
s = 28
p = I(I(O(I(O(I(I(O(I(I(I(O(O(I(I(O(I(O(O(O(I(O(O(I(I(O(O(I(E))))))))))))))))))))))))))))
import time
def main():
start = time.time()
for a in range(2 ** h):
for b in range(2 ** h):
binA = cat(bin(h, a), bin(h, 0))
binB = cat(bin(h, b), bin(h, 0))
if eq(mul(binA, binB), p):
duration = time.time() - start
print(f"FACT: {a} {b}")
print(f"TIME: {duration:.7f} seconds")
return
if __name__ == "__main__":
main()
// COMMAND: bun run prime.js
// ADT constructors
const O = p => ({$:"O", p})
const I = p => ({$:"I", p})
const E = {$:"E"}
const u32 = b => {
switch(b.$) {
case "O": return 2 * u32(b.p) + 0
case "I": return 2 * u32(b.p) + 1
case "E": return 0
}
}
const bin = (s, n) => {
if (s === 0) return E
return n % 2 === 0
? O(bin(s-1, Math.floor(n/2)))
: I(bin(s-1, Math.floor(n/2)))
}
const eq = (a, b) => {
if (a.$ === "E" && b.$ === "E") return true
if (a.$ === "O" && b.$ === "O") return eq(a.p, b.p)
if (a.$ === "I" && b.$ === "I") return eq(a.p, b.p)
return false
}
const inc = b => {
switch(b.$) {
case "O": return I(b.p)
case "I": return O(inc(b.p))
case "E": return E
}
}
const add = (a, b) => {
if (a.$ === "E" || b.$ === "E") return E
if (a.$ === "O" && b.$ === "O") return O(add(a.p, b.p))
if (a.$ === "O" && b.$ === "I") return I(add(a.p, b.p))
if (a.$ === "I" && b.$ === "O") return I(add(a.p, b.p))
if (a.$ === "I" && b.$ === "I") return O(inc(add(a.p, b.p)))
}
const mul = (a, b) => {
if (b.$ === "E") return E
if (b.$ === "O") return O(mul(a, b.p))
if (b.$ === "I") return add(a, O(mul(a, b.p)))
}
const cat = (a, b) => {
switch(a.$) {
case "O": return O(cat(a.p, b))
case "I": return I(cat(a.p, b))
case "E": return b
}
}
//const k = 12
//const h = 13
//const s = 26
//const p = I(O(I(I(O(O(O(O(I(I(I(O(O(O(I(O(I(I(I(I(I(I(O(I(O(I(E))))))))))))))))))))))))))
const k = 13
const h = 14
const s = 28
const p = I(I(O(I(O(I(I(O(I(I(I(O(O(I(I(O(I(O(O(O(I(O(O(I(I(O(O(I(E))))))))))))))))))))))))))))
async function main() {
const start = Date.now()
for (let a = 0; a < Math.pow(2, h); a++) {
for (let b = 0; b < Math.pow(2, h); b++) {
const binA = cat(bin(h, a), bin(h, 0))
const binB = cat(bin(h, b), bin(h, 0))
if (eq(mul(binA, binB), p)) {
const duration = (Date.now() - start) / 1000
console.log(`FACT: ${a} ${b}`)
console.log(`TIME: ${duration.toFixed(7)} seconds`)
process.exit(0)
}
}
}
}
main()
// COMMAND: ghc -O2 prime.hs -o prime; ./prime
import Control.Monad (forM_, when)
import Data.Time.Clock (getCurrentTime, diffUTCTime)
import System.Exit (exitSuccess)
import Text.Printf (printf)
data Bin = O Bin | I Bin | E deriving (Show, Eq)
u32 :: Bin -> Int
u32 (O p) = 2 * u32 p + 0
u32 (I p) = 2 * u32 p + 1
u32 E = 0
bin :: Int -> Int -> Bin
bin 0 _ = E
bin s n = case n `mod` 2 of
0 -> O (bin (s-1) (n `div` 2))
_ -> I (bin (s-1) (n `div` 2))
eq :: Bin -> Bin -> Bool
eq E E = True
eq (O a) (O b) = eq a b
eq (I a) (I b) = eq a b
eq _ _ = False
inc :: Bin -> Bin
inc (O p) = I p
inc (I p) = O (inc p)
inc E = E
add :: Bin -> Bin -> Bin
add (O a) (O b) = O (add a b)
add (O a) (I b) = I (add a b)
add (I a) (O b) = I (add a b)
add (I a) (I b) = O (inc (add a b))
add E b = E
add a E = E
mul :: Bin -> Bin -> Bin
mul _ E = E
mul a (O b) = O (mul a b)
mul a (I b) = add a (O (mul a b))
cat :: Bin -> Bin -> Bin
cat (O a) b = O (cat a b)
cat (I a) b = I (cat a b)
cat E b = b
-- k = 12
-- h = 13
-- s = 26
-- p = I(O(I(I(O(O(O(O(I(I(I(O(O(O(I(O(I(I(I(I(I(I(O(I(O(I(E))))))))))))))))))))))))))
k = 13
h = 14
s = 28
p = I(I(O(I(O(I(I(O(I(I(I(O(O(I(I(O(I(O(O(O(I(O(O(I(I(O(O(I(E))))))))))))))))))))))))))))
main :: IO ()
main = do
start <- getCurrentTime
forM_ [0..2^h-1] $ \a -> do
forM_ [0..2^h-1] $ \b -> do
let binA = cat (bin h a) (bin h 0)
let binB = cat (bin h b) (bin h 0)
when (eq (mul binA binB) p) $ do
end <- getCurrentTime
let duration = diffUTCTime end start
putStrLn $ "FACT: " ++ show a ++ " " ++ show b
putStrLn $ "TIME: " ++ printf "%.7f seconds" (realToFrac duration :: Double)
exitSuccess
// COMMAND: hvml run main.hvml -s -S -c
// Bitstrings
data Bin { #O{pred} #I{pred} #E }
// If-Then-Else
@if(b t f) = ~b {
0: f
k: t
}
// Converts a Bin to an U32
@u32(b) = ~b{
#O{p}: (+ (* 2 @u32(p)) 0)
#I{p}: (+ (* 2 @u32(p)) 1)
#E: 0
}
// Converts an U32 to a Bin of given size
@bin(s n) = ~s{
0: #E
p: !&0{n0 n1}=n ~(% n0 2) !p !n1 {
0: #O{@bin(p (/ n1 2))}
k: #I{@bin(p (/ n1 2))}
}
}
// Bin Equality
@eq(a b) = ~a !b {
#E: ~b {
#O{bp}: 0
#I{bp}: 0
#E: 1
}
#O{ap}: ~b{
#O{bp}: @eq(ap bp)
#I{bp}: 0
#E: 0
}
#I{ap}: ~b{
#O{bp}: 0
#I{bp}: @eq(ap bp)
#E: 0
}
}
// Increments a Bin
@inc(a) = ~a{
#O{p}: #I{p}
#I{p}: #O{@inc(p)}
#E: #E
}
// Decrements a Bin
@dec(a) = ~a{
#O{p}: #O{@dec(p)}
#I{p}: #I{p}
#E: #E
}
// Adds two Bins
@add(a b) = ~a !b {
#O{ap}: ~b !ap {
#O{bp}: #O{@add(ap bp)}
#I{bp}: #I{@add(ap bp)}
#E: #E
}
#I{ap}: ~b !ap {
#O{bp}: #I{@add(ap bp)}
#I{bp}: #O{@inc(@add(ap bp))}
#E: #E
}
#E: #E
}
// Muls two Bins
@mul(a b) = ~b !a {
#O{bp}: #O{@mul(a bp)}
#I{bp}: !&0{a0 a1}=a @add(a0 #O{@mul(a1 bp)})
#E: #E
}
// Concatenates two Bins
@cat(a b) = ~a !b {
#O{ap}: #O{@cat(ap b)}
#I{ap}: #I{@cat(ap b)}
#E: b
}
// Enums all Bins of given size (label 1)
@all1(s) = ~s{
0: #E
p: !&1{p0 p1}=p &1{ #O{@all1(p0)} #I{@all1(p1)} }
}
// Enums all Bins of given size (label 2)
@all2(s) = ~s{
0: #E
p: !&2{p0 p1}=p &2{ #O{@all2(p0)} #I{@all2(p1)} }
}
//@K = 12
//@H = 13
//@S = 26
//@X = @cat(@all1(@H) @bin(@H 0))
//@Y = @cat(@all2(@H) @bin(@H 0))
//@P = #1{#0{#1{#1{#0{#0{#0{#0{#1{#1{#1{#0{#0{#0{#1{#0{#1{#1{#1{#1{#1{#1{#0{#1{#0{#1{#2}}}}}}}}}}}}}}}}}}}}}}}}}}
@K = 13
@H = 14
@S = 28
@X = @cat(@all1(@H) @bin(@H 0))
@Y = @cat(@all2(@H) @bin(@H 0))
@P = #1{#1{#0{#1{#0{#1{#1{#0{#1{#1{#1{#0{#0{#1{#1{#0{#1{#0{#0{#0{#1{#0{#0{#1{#1{#0{#0{#1{#2}}}}}}}}}}}}}}}}}}}}}}}}}}}}
@main = @if(@eq(@mul(@X @Y) @P) λt(t @u32(@X) @u32(@Y)) *)
This is a new test that I just implemented today (Nov 28, 2024). The goal here was to solve a simple higher-order equation:
(?X λt(t 1 2)) == λt(t 2 1)
I implemented it in 3 different ways:
-
- Enumerate BLC strings, parse and test (Haskell): 0.992s
-
- Superpose BLC strings, parse and test (HVM): 0.026s (38x speedup)
-
- Superpose λ-Terms directly and test (HVM): 0.0011s (862x speedup)
The 3rd approach is implemented and described in this file.
@VictorTaelin The HVM links are broken, please consider merging https://gist.github.com/ashtonsix/a173ca92c45784cb10b4d199d885abb6, which include the following URI changes: