HVM3 - Pitch Benchmarks

Stress Test - HVM's Historic Progress

This is a stress test of the maximum possible throughput of the HVM. It is a very simple program that just loops in a "best case" scenario: i.e., maximum parallelism, and maximum compiled speed (since it compiles to a C loop). I like it because it highlights each major generational breakthrough:

Optlam/Absal were my first JS toy implementations of the theory. It was exponentially faster than GHC in higher-order computations, but veeeery slow in "normal" programs.
Formality-Net was an early attempt at implementing inets efficiently back when I worked on Ethereum, but it was memory inefficient, and had a lower throughput than I wished.
HVM1 was the first implementation of our memory-efficient architecture. It was much faster than Absal, and "just" ~3x slower than Haskell/GHC in a single core. It also had a preview of a parallel evaluator and a compiler, but both were experimental and buggy.
HVM2 marks the first correct version of our mass parallelizer. It achieves near-ideal speedup per core. In 16 CPU cores, it is ~12x faster than in one CPU core. In ~32384 GPU cores, it is ~20000 faster than in one GPU core.
HVM3 marks the first long-term release of our low-level compiler. It identifies fragments that can be compiled to efficient low-level code, and uses C on these. This results in a 10x-100x speedup in some cases, surpassing Haskell single-core and approaching C and Rust.

The combination of all of these breakthroughs - a memory-efficient architecture (HVM1), a mass parallelizer (HVM2), and a low-level compiler (HVM3) will form the backbone of Bend, which I believe is heading towards becoming the fastest programming language in the world, period. And all of that while being asymptotically optimal, i.e., exponentially (in a complexity theory sense) faster than all alternatives for higher order and symbolic computations.

Absal: ~12h

(performance estimated based on interactions per second - Absal had no native ints)

Formality-Net: ~5m

(performance also estimated based on interactions per second)

HVM1: 16.36s

Initial architecture, with efficient memory format

// Run with compiled mode, 1-thread:
// $ hvm1 compile stress.hvm1
// $ cd stress
// $ cargo build
// $ cd target/release
// $ hvm1 run "Main" -t 1 -c

(Loop 0) = 0
(Loop n) = (Loop (- n 1))

(Fun 0) = (Loop 65536)
(Fun n) = (+ (Fun (- n 1)) (Fun (- n 1)))

(Main) = (Fun 16)

HVM2: 2.57s

Automatic parallelism introduced

// Run compiled to C:
// $ hvm2 gen-c stress.hvm2 > stress.c
// $ gcc stress.c -O2 stress
// $ ./stress

@fun = (?((@fun__C0 @fun__C1) a) a)

@fun__C0 = a
  & @loop ~ (65536 a)

@fun__C1 = ({a b} d)
  &! @fun ~ (a $([+] $(c d)))
  &! @fun ~ (b c)

@loop = (?((0 @loop__C0) a) a)

@loop__C0 = a
  & @loop ~ a

@main = a
  & @fun ~ (16 a)

HVM3: 0.27s

Low-level compiler and codegen (coming soon!)

// Run with C mode:
// $ hvm3 run stress.hvm3 -c -s

@loop(n) = ~n {
  0: 0
  p: @loop(p)
}


@fun(n) = ~n{
  0: @loop(65536)
  p: !&0{p0 p1}=p (+ @fun(p0) @fun(p1))
}

@main = @fun(16)

Note, again, that this is a very simple test to highlight the key milestones. We aim to launch a full benchmark suite after the release of the compiler, and I believe the results will speak for themselves, and make everyone see why interaction nets can be so efficient: they're the leanest model of computation, and, surprisingly enough, the less things a computer do, the faster it runs.

Algebraic Factorizer - HVM vs Alternatives (for Symbolic Computing)

In this test, we implemented a prime number factorizer with a catch: it uses no machine ints, only algebraic datatypes, pattern-matching and recursion. Of course, this is a terrible way to factor numbers; the goal here is to test each runtime as a symbolic computing engine. It is a very simple test on which HVM already shows a ~20x speedup against GHC, which is one of the fastest functional compilers in the world. Note this speedup isn't related to parallelism nor λ-encodings. It is merely due to optimal looping (via superposition nodes). We believe larger speedups will be seen in more complex algorithms.

Python: ~2h

# COMMAND: python main.py

# ADT constructors using classes for faster pattern matching
class O:
    def __init__(self, p):
        self.p = p
        self.tag = 0  # Using integers for faster comparison

class I:
    def __init__(self, p):
        self.p = p
        self.tag = 1

class E:
    def __init__(self):
        self.tag = 2

E = E()  # Single instance for empty

def u32(b):
    if b.tag == 0:  # O
        return 2 * u32(b.p) + 0
    elif b.tag == 1:  # I
        return 2 * u32(b.p) + 1
    else:  # E
        return 0

def bin(s, n):
    if s == 0:
        return E
    return O(bin(s-1, n//2)) if n % 2 == 0 else I(bin(s-1, n//2))

def eq(a, b):
    if a.tag == 2 and b.tag == 2:
        return True
    if a.tag == 0 and b.tag == 0:
        return eq(a.p, b.p)
    if a.tag == 1 and b.tag == 1:
        return eq(a.p, b.p)
    return False

def inc(b):
    if b.tag == 0:
        return I(b.p)
    elif b.tag == 1:
        return O(inc(b.p))
    else:
        return E

def add(a, b):
    if a.tag == 2 or b.tag == 2:
        return E
    if a.tag == 0 and b.tag == 0:
        return O(add(a.p, b.p))
    if a.tag == 0 and b.tag == 1:
        return I(add(a.p, b.p))
    if a.tag == 1 and b.tag == 0:
        return I(add(a.p, b.p))
    if a.tag == 1 and b.tag == 1:
        return O(inc(add(a.p, b.p)))

def mul(a, b):
    if b.tag == 2:
        return E
    if b.tag == 0:
        return O(mul(a, b.p))
    return add(a, O(mul(a, b.p)))

def cat(a, b):
    if a.tag == 0:
        return O(cat(a.p, b))
    elif a.tag == 1:
        return I(cat(a.p, b))
    else:
        return b

#k = 12
#h = 13
#s = 26
#p = I(O(I(I(O(O(O(O(I(I(I(O(O(O(I(O(I(I(I(I(I(I(O(I(O(I(E))))))))))))))))))))))))))

k = 13
h = 14
s = 28
p = I(I(O(I(O(I(I(O(I(I(I(O(O(I(I(O(I(O(O(O(I(O(O(I(I(O(O(I(E))))))))))))))))))))))))))))

import time

def main():
    start = time.time()
    for a in range(2 ** h):
        for b in range(2 ** h):
            binA = cat(bin(h, a), bin(h, 0))
            binB = cat(bin(h, b), bin(h, 0))
            if eq(mul(binA, binB), p):
                duration = time.time() - start
                print(f"FACT: {a} {b}")
                print(f"TIME: {duration:.7f} seconds")
                return

if __name__ == "__main__":
    main()

JavaScript: 5m40s

// COMMAND: bun run prime.js

// ADT constructors
const O = p => ({$:"O", p})
const I = p => ({$:"I", p})
const E = {$:"E"}

const u32 = b => {
  switch(b.$) {
    case "O": return 2 * u32(b.p) + 0
    case "I": return 2 * u32(b.p) + 1
    case "E": return 0
  }
}

const bin = (s, n) => {
  if (s === 0) return E
  return n % 2 === 0 
    ? O(bin(s-1, Math.floor(n/2)))
    : I(bin(s-1, Math.floor(n/2)))
}

const eq = (a, b) => {
  if (a.$ === "E" && b.$ === "E") return true
  if (a.$ === "O" && b.$ === "O") return eq(a.p, b.p)
  if (a.$ === "I" && b.$ === "I") return eq(a.p, b.p)
  return false
}

const inc = b => {
  switch(b.$) {
    case "O": return I(b.p)
    case "I": return O(inc(b.p))
    case "E": return E
  }
}

const add = (a, b) => {
  if (a.$ === "E" || b.$ === "E") return E
  if (a.$ === "O" && b.$ === "O") return O(add(a.p, b.p))
  if (a.$ === "O" && b.$ === "I") return I(add(a.p, b.p))
  if (a.$ === "I" && b.$ === "O") return I(add(a.p, b.p))
  if (a.$ === "I" && b.$ === "I") return O(inc(add(a.p, b.p)))
}

const mul = (a, b) => {
  if (b.$ === "E") return E
  if (b.$ === "O") return O(mul(a, b.p))
  if (b.$ === "I") return add(a, O(mul(a, b.p)))
}

const cat = (a, b) => {
  switch(a.$) {
    case "O": return O(cat(a.p, b))
    case "I": return I(cat(a.p, b))
    case "E": return b
  }
}

//const k = 12
//const h = 13
//const s = 26
//const p = I(O(I(I(O(O(O(O(I(I(I(O(O(O(I(O(I(I(I(I(I(I(O(I(O(I(E))))))))))))))))))))))))))

const k = 13
const h = 14
const s = 28
const p = I(I(O(I(O(I(I(O(I(I(I(O(O(I(I(O(I(O(O(O(I(O(O(I(I(O(O(I(E))))))))))))))))))))))))))))

async function main() {
  const start = Date.now()
  for (let a = 0; a < Math.pow(2, h); a++) {
    for (let b = 0; b < Math.pow(2, h); b++) {
      const binA = cat(bin(h, a), bin(h, 0))
      const binB = cat(bin(h, b), bin(h, 0))
      if (eq(mul(binA, binB), p)) {
        const duration = (Date.now() - start) / 1000
        console.log(`FACT: ${a} ${b}`)
        console.log(`TIME: ${duration.toFixed(7)} seconds`)
        process.exit(0)
      }
    }
  }
}

main()

Haskell: 2.76s

// COMMAND: ghc -O2 prime.hs -o prime; ./prime

import Control.Monad (forM_, when)
import Data.Time.Clock (getCurrentTime, diffUTCTime)
import System.Exit (exitSuccess)
import Text.Printf (printf)

data Bin = O Bin | I Bin | E deriving (Show, Eq)

u32 :: Bin -> Int
u32 (O p) = 2 * u32 p + 0
u32 (I p) = 2 * u32 p + 1
u32 E     = 0

bin :: Int -> Int -> Bin
bin 0 _ = E
bin s n = case n `mod` 2 of
  0 -> O (bin (s-1) (n `div` 2))
  _ -> I (bin (s-1) (n `div` 2))

eq :: Bin -> Bin -> Bool
eq E     E     = True
eq (O a) (O b) = eq a b
eq (I a) (I b) = eq a b
eq _     _     = False

inc :: Bin -> Bin
inc (O p) = I p
inc (I p) = O (inc p)
inc E     = E

add :: Bin -> Bin -> Bin
add (O a) (O b) = O (add a b)
add (O a) (I b) = I (add a b)
add (I a) (O b) = I (add a b)
add (I a) (I b) = O (inc (add a b))
add E     b     = E
add a     E     = E

mul :: Bin -> Bin -> Bin
mul _ E     = E
mul a (O b) = O (mul a b)
mul a (I b) = add a (O (mul a b))

cat :: Bin -> Bin -> Bin
cat (O a) b = O (cat a b)
cat (I a) b = I (cat a b)
cat E     b = b

-- k = 12
-- h = 13
-- s = 26
-- p = I(O(I(I(O(O(O(O(I(I(I(O(O(O(I(O(I(I(I(I(I(I(O(I(O(I(E))))))))))))))))))))))))))

k = 13
h = 14
s = 28
p = I(I(O(I(O(I(I(O(I(I(I(O(O(I(I(O(I(O(O(O(I(O(O(I(I(O(O(I(E))))))))))))))))))))))))))))

main :: IO ()
main = do
  start <- getCurrentTime
  forM_ [0..2^h-1] $ \a -> do
    forM_ [0..2^h-1] $ \b -> do
      let binA = cat (bin h a) (bin h 0)
      let binB = cat (bin h b) (bin h 0)
      when (eq (mul binA binB) p) $ do
        end <- getCurrentTime
        let duration = diffUTCTime end start
        putStrLn $ "FACT: " ++ show a ++ " " ++ show b
        putStrLn $ "TIME: " ++ printf "%.7f seconds" (realToFrac duration :: Double)
        exitSuccess

HVM3: 0.15s

// COMMAND: hvml run main.hvml -s -S -c

// Bitstrings
data Bin { #O{pred} #I{pred} #E }

// If-Then-Else
@if(b t f) = ~b {
  0: f
  k: t
}

// Converts a Bin to an U32
@u32(b) = ~b{
  #O{p}: (+ (* 2 @u32(p)) 0)
  #I{p}: (+ (* 2 @u32(p)) 1)
  #E: 0
}

// Converts an U32 to a Bin of given size
@bin(s n) = ~s{
  0: #E
  p: !&0{n0 n1}=n ~(% n0 2) !p !n1 {
    0: #O{@bin(p (/ n1 2))}
    k: #I{@bin(p (/ n1 2))}
  }
}

// Bin Equality
@eq(a b) = ~a !b {
  #E: ~b {
    #O{bp}: 0
    #I{bp}: 0
    #E: 1
  }
  #O{ap}: ~b{
    #O{bp}: @eq(ap bp)
    #I{bp}: 0
    #E: 0
  }
  #I{ap}: ~b{
    #O{bp}: 0
    #I{bp}: @eq(ap bp)
    #E: 0
  }
}

// Increments a Bin
@inc(a) = ~a{
  #O{p}: #I{p}
  #I{p}: #O{@inc(p)}
  #E: #E
}

// Decrements a Bin
@dec(a) = ~a{
  #O{p}: #O{@dec(p)}
  #I{p}: #I{p}
  #E: #E
}

// Adds two Bins
@add(a b) = ~a !b {
  #O{ap}: ~b !ap {
    #O{bp}: #O{@add(ap bp)}
    #I{bp}: #I{@add(ap bp)}
    #E: #E
  }
  #I{ap}: ~b !ap {
    #O{bp}: #I{@add(ap bp)}
    #I{bp}: #O{@inc(@add(ap bp))}
    #E: #E
  }
  #E: #E
}

// Muls two Bins
@mul(a b) = ~b !a {
  #O{bp}: #O{@mul(a bp)}
  #I{bp}: !&0{a0 a1}=a @add(a0 #O{@mul(a1 bp)})
  #E: #E
}

// Concatenates two Bins
@cat(a b) = ~a !b {
  #O{ap}: #O{@cat(ap b)}
  #I{ap}: #I{@cat(ap b)}
  #E: b
}

// Enums all Bins of given size (label 1)
@all1(s) = ~s{
  0: #E
  p: !&1{p0 p1}=p &1{ #O{@all1(p0)} #I{@all1(p1)} }
}

// Enums all Bins of given size (label 2)
@all2(s) = ~s{
  0: #E
  p: !&2{p0 p1}=p &2{ #O{@all2(p0)} #I{@all2(p1)} }
}

//@K = 12
//@H = 13
//@S = 26
//@X = @cat(@all1(@H) @bin(@H 0))
//@Y = @cat(@all2(@H) @bin(@H 0))
//@P = #1{#0{#1{#1{#0{#0{#0{#0{#1{#1{#1{#0{#0{#0{#1{#0{#1{#1{#1{#1{#1{#1{#0{#1{#0{#1{#2}}}}}}}}}}}}}}}}}}}}}}}}}}

@K = 13
@H = 14
@S = 28
@X = @cat(@all1(@H) @bin(@H 0))
@Y = @cat(@all2(@H) @bin(@H 0))
@P = #1{#1{#0{#1{#0{#1{#1{#0{#1{#1{#1{#0{#0{#1{#1{#0{#1{#0{#0{#0{#1{#0{#0{#1{#1{#0{#0{#1{#2}}}}}}}}}}}}}}}}}}}}}}}}}}}}

@main = @if(@eq(@mul(@X @Y) @P) λt(t @u32(@X) @u32(@Y)) *)

Higher Order equation solving

This is a new test that I just implemented today (Nov 28, 2024). The goal here was to solve a simple higher-order equation:

(?X λt(t 1 2)) == λt(t 2 1)

I implemented it in 3 different ways:

1. Enumerate BLC strings, parse and test (Haskell): 0.992s
1. Superpose BLC strings, parse and test (HVM): 0.026s (38x speedup)
1. Superpose λ-Terms directly and test (HVM): 0.0011s (862x speedup)

The 3rd approach is implemented and described in this file.

VictorTaelin/hvm_pitch_benchmarks.md

HVM3 - Pitch Benchmarks

Stress Test - HVM's Historic Progress

Absal: ~12h

Formality-Net: ~5m

HVM1: 16.36s

HVM2: 2.57s

HVM3: 0.27s

Algebraic Factorizer - HVM vs Alternatives (for Symbolic Computing)

Python: ~2h

JavaScript: 5m40s

Haskell: 2.76s

HVM3: 0.15s

Higher Order equation solving

ashtonsix commented Nov 28, 2024 •

edited

Loading

VictorTaelin/hvm_pitch_benchmarks.md

HVM3 - Pitch Benchmarks

Stress Test - HVM's Historic Progress

Absal: ~12h

Formality-Net: ~5m

HVM1: 16.36s

HVM2: 2.57s

HVM3: 0.27s

Algebraic Factorizer - HVM vs Alternatives (for Symbolic Computing)

Python: ~2h

JavaScript: 5m40s

Haskell: 2.76s

HVM3: 0.15s

Higher Order equation solving

ashtonsix commented Nov 28, 2024 • edited Loading

ashtonsix commented Nov 28, 2024 •

edited

Loading