Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save VictorTaelin/1634ef66382801a26eaac3757989c4e3 to your computer and use it in GitHub Desktop.

Select an option

Save VictorTaelin/1634ef66382801a26eaac3757989c4e3 to your computer and use it in GitHub Desktop.
consider the text below:
```
CGen Language:
Type ::=
Emp ::= "⊥"
Uni ::= "⊤"
Eit ::= Type "|" Type
Par ::= Type "&" Type
Fun ::= Type "→" Type
Func ::=
Lam ::= "λ" Name "." Func
EFQ ::= "λ" "."
Mat ::= "λ" "{"? "#0" ":" Func ";"? "#1" ":" Func ";"? "}"?
Get ::= "λ" "!" Func
Del ::= "λ" "-" Func
Dup ::= "λ" "+" Func
Voi ::= "&{}"
Frk ::= "&" Label "{" Func "," Func "}"
Ret ::= "~" Term
Expr ::=
Var ::= Name
One ::= "()"
Inl ::= "#0" "{"? Expr "}"?
Inr ::= "#1" "{"? Expr "}"?
Tup ::= "(" Expr "," Expr ")"
Ref ::= "@" Name "(" [Expr ","?] ")"
Era ::= "&" "{" "}"
Sup ::= "&" Label "{" Expr "," Expr "}"
Book ::=
Def ::= "@" Name ":" Type "=" Func ";"
End ::= ";"
Interaction Rules:
λx.f v
-------- app-lam
f[x ← v]
λ{#0:f;#1:g} #0{x}
------------------ app-mat-inl
f x
λ{#0:f;#1:g} #1{x}
------------------ app-mat-inr
g x
λ!f (x,y)
--------- app-get-tup
f x y
Structural Rules:
λ-f v
----- app-del
f
λ+f v
----- app-dup
f v v
Erasure Rules:
λ{#0:f;#1:g} &{}
---------------- app-mat-era
&{}
λ!f &{}
------- app-get-era
&{}
Superposition Rules:
λ{#0:f;#1:g} &L{x,y}
-------------------- app-mat-sup
&{}
λ!f &L{x,y}
----------- app-get-sup
&{}
&{} x
----- app-voi
&{}
&L{f,g} x
------------ app-frk
&L{f x, g x}
Evaluator (CEK machine, Felleisen 1986):
Three tail-recursive states: app (apply Func to args), eval
(evaluate Term), cont (continue through frames with a value).
C = Func | Term, E = Ctx (de Bruijn levels), K = [Frame].
Frame ::=
F_DIS [Term] Ctx Label Func Term
F_FRK [Term] Ctx Label Func
F_INL
F_INR
F_TUP Term Ctx
F_SUP Label Term Ctx
F_TUP2 Term
F_SUP2 Label Term
F_REF Func [Term] [Term] Ctx
norm t = eval t [] []
app (Lam b) (x : s) fs c = app b s fs (x:c)
app EFQ _ fs _ = cont Era fs
app (Mat a b) (Inl x : s) fs c = app a (x:s) fs c
app (Mat a b) (Inr x : s) fs c = app b (x:s) fs c
app (Mat a b) (Era : _) fs _ = cont Era fs
app (Mat a b) (Sup l x y : s) fs c = app (Mat a b) (x:s) (F_DIS s c l (Mat a b) y:fs) c
app (Get b) (Tup x y : s) fs c = app b (x:y:s) fs c
app (Get b) (Era : _) fs _ = cont Era fs
app (Get b) (Sup l x y : s) fs c = app (Get b) (x:s) (F_DIS s c l (Get b) y:fs) c
app (Del b) (_ : s) fs c = app b s fs c
app (Dup b) (x : s) fs c = app b (x:x:s) fs c
app Voi _ fs _ = cont Era fs
app (Frk l a b) s fs c = app a s (F_FRK s c l b:fs) c
app (Ret x) _ fs c = eval x fs c
app f (Var i : s) fs c = app f (c[i]:s) fs c
eval (Var i) fs c = eval c[i] fs c
eval (Inl x) fs c = eval x (F_INL:fs) c
eval (Inr x) fs c = eval x (F_INR:fs) c
eval (Tup x y) fs c = eval x (F_TUP y c:fs) c
eval (Sup l x y) fs c = eval x (F_SUP l y c:fs) c
eval (Ref n (t:ts)) fs c = eval t (F_REF bk[n] [] ts c:fs) c
eval (Ref n []) fs _ = app bk[n] [] fs []
eval x fs _ = cont x fs
cont x (F_DIS s c l f y : fs) = app f (y:s) (F_SUP2 l x:fs) c
cont x (F_FRK s c l g : fs) = app g s (F_SUP2 l x:fs) c
cont x (F_TUP y c : fs) = eval y (F_TUP2 x:fs) c
cont x (F_SUP l y c : fs) = eval y (F_SUP2 l x:fs) c
cont x (F_INL : fs) = cont (Inl x) fs
cont x (F_INR : fs) = cont (Inr x) fs
cont x (F_TUP2 a : fs) = cont (Tup a x) fs
cont x (F_SUP2 l a : fs) = cont (Sup l a x) fs
cont x (F_REF f dn (t:ts) c : fs) = eval t (F_REF f (x:dn) ts c:fs) c
cont x (F_REF f dn [] _ : fs) = app f (rev (x:dn)) fs []
cont x [] = x
Invariant: cont never re-enters eval with a term it just wrapped.
Wrapping frames (F_INL, F_INR, F_TUP2, F_SUP2) stay in cont.
```
this text was the spec for a very old iteration of HVM, before we understood how
DUP's (the dual of SUP's worked) and why they're necessary to make superposition
interactions sound. since them, we built HVM, which is a very fast evaluator
that performs ~100m interactions/s in all gen/ hard problems. yet, even though
HVM is remarkably efficient, it is nowhere even near the old CGen interpreter,
which performed *1.3 billion* (!!!) interactions/s, and was 10x+ faster than
./supgenvm.c in all benchmarks. that is mostly attributable to the CEK like
architecture, which allowed it to match eliminators like mat and get against
introducers like inl, inr and tup directly, without intermediate allocations of
λ's and λ-matches, without ALO nodes, without any of the heavy machinery that
slown down HVM. now, in this repo, we have a file named ./supgenvm.c, which is a
distilled version of HVM with just the functionality needed to run supgen.hvm.
that gives us a small playground which we can edit to explore HVM optimizations,
without having to mess with the full repo. your goal is to, taking huge
inspiration from how CGen worked, *reimplement a brand new HVM, from scratch,
that evaluates supgen.hvm correctly, outputting the same results for all gen/
problems, except *much* faster in terms of raw interactions/s; expectedly, 10x
faster or so. this goal might take a long time, and you CANNOT stop working on
it until you succeed. this is very important! the first and most difficult part
of this is to define the evaluator; i.e., the formulas above, except redesigned
to support the full HVM and all its computationally relevant interactions, in
particular, the most challenging interaction is the DUP-LAM rule:
! F &L = λx.f
---------------- dup-lam
! G &L = f
F₀ ← λ$x0. G₀
F₁ ← λ$x1. G₁
x ← &L{$x0,$x1}
the reason this makes it so difficult is that this rule substitutes F₀ and F₁
with two "globally scoped" lambdas, `λ$x0` and `λ$x1`. the problem is that this
rule is triggered when we enter the λx.f lambda via either F₀ or F₁, which may
occur in distant parts of the program, yet, whether we access via F₀ or F₁, this
rule affects the other variable, because whenever we enter the other variable,
we will NOT repeat this interaction again, but merely access the duplicated
lambda that the first visitor allocated. this lambda allocation seems
unavoidable, and this is what makes CEK-style evaluation difficult, as the CEK
machine never actually allocs or copies λ-matches or λ's; instead, these are
just static *code* associated with a top-level function. the lambdas match
against live terms like inl, inr, tup, but they're never allocated. so, how
can we implement full HVM in this style?
an interesting idea would be to re-implement dup-lam as:
! F &L = λx.f
---------------- dup-lam (via F₀)
! G &L = f
F₀ ← λ$x0. G₀
x ← &L{$x0,&{}}
i.e., if we access it via F₀, we substitute just F₀ (i.e., immediatelly
returning it), and extend the CEK context with &L{$x0,&{}}. this is technically
sound per interaction rules; sup / dup paths will annihilate as expected, the
final result will match; but this destroys all sharing enjoyed by interaction
net evaluation, resulting in asymptotic complexity explosion. so, it isn't
really viable. something else is needed.
as you can tell, I don't have a solution, just rough ideas. perhaps what we want
isn't even a CEK machine, but the fact a CEK evaluator for CGen achieved 1.3
billion i/s can serve as an inspiration that, perhaps, the way HVM is currently
constructed is not ideal, and perhaps we could redesign it in a way that
achieves a much higher throughput; possibly ways I'm not even thinking of.
this is your goal: redesign HVM and achieve a much higher performance.
do not stop working until you have a true breakthrough (at least 3x speedup in
raw intearctions/s on all hard gen/ tests).
important: do NOT edit any file in this repository. always work in your own
isolated worktree branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment