Skip to content

Instantly share code, notes, and snippets.

@solomon-b
Created August 7, 2019 01:41
Show Gist options
  • Save solomon-b/5982ba160c66b427217095f793cd7e5f to your computer and use it in GitHub Desktop.
Save solomon-b/5982ba160c66b427217095f793cd7e5f to your computer and use it in GitHub Desktop.
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeSynonymInstances #-}
module Main where
{-
When writing programs in a statically typed language such as Haskell,
there are two concepts we often think about and try to reconcile:
Generic Programming - Generally extensible and repurposable code. (This
can be achieved through a variety of strategies).
Algebraic Datatypes as DSLs - Encoding the semantics of your program
into ADTs. In this view we see the data types as a DSL and the
processing of the data types as the interpretation of the DSL.
These two strategies seek to resolve different problems:
Generic Programming has the goal of more universally applicable code.
Embedded DSLs seeks to constrain, via the typesystem, the space of possible
programs to elminate meaningless but otherwise valid expressions.
So the question Oleg's paper wishes to discuss is how can we write
programs that are both highly extensible and generic, but which are
also able to express invariants and validity constraints.
Oleg names the solution "Typed Extensible Interpreters (of Typed
Languages)" and he offers to implementation strategies which he calls
the Initial and Final encodings.
Initial Encodings represent the terms of your Object Language as
values in an ADT in the Metalanguage and interpreters as functions
that recursively traverse and deconstruct the ADT through pattern
matching.
Final Encoding represents the Object Language by expressions in the
Metalanguage. "An object term is hence represented not by its abstract
syntax but by its denotation in a semantic domain. Abstracting over the
domain gives us a family of interpretations."
So now that i've got that out of the way. I'm going to work through
examples of both encodings for a really simple untyped arithmetic
language. This example is taken directly from Oleg's paper.
-}
--- A Unityped Initial Encoding of an object language:
data Exp = Lit Int
| Neg Exp
| Add Exp Exp
-- An expresion of the object language:
ti1 :: Exp
ti1 = Add (Lit 8) (Neg (Add (Lit 1) (Lit 2)))
-- Interpreter One: An Evaluator
eval' :: Exp -> Int
eval' (Lit n) = n
eval' (Neg e) = - eval' e
eval' (Add e1 e2) = eval' e1 + eval' e2
-- eval ti1 == 5
view' :: Exp -> String
view' (Lit n) = show n
view' (Neg e) = "(- " ++ view' e ++ ")"
view' (Add e1 e2) = "(" ++ view' e1 ++ " + " ++ view' e2 ++ ")"
-- view ti1 = (8 + (- (1 + 2)))
{-
An intuition towards Final encoding.
Alternatively, we can embed our langugage in haskell by representing
our terms by their untagged values or by a haskell expression that
computes that value.
We use a type alias called Repr to indicate that this haskell value
is actually a value in our embedded language
-}
type Repr = Int
lit' :: Int -> Repr
lit' n = n
neg' :: Repr -> Repr
neg' e = -e
add' :: Repr -> Repr -> Repr
add' e1 e2 = e1 + e2
-- An expresion of the object language:
tf1' :: Repr
tf1' = add' (lit' 8) (neg' (add' (lit' 1) (lit' 2)))
{-
Note That this is an ordinary haskell expression which evaluates to 5 tf1
So this isn't the actual Final encoding, however this will give a bit
of an intuition about it. Also the comparison between this encoding
and Initial encoding outlines the problem at the heart of the question posed in the introduction:
The Expression Problem:
How can we write a program that is extensible over both the grammar
(the datatype cases) and its interpretations (functions over the ADTl)
without needing to rewrite code (or recompile it)?
-}
{-
So lets look at the differences:
Initial is not extensible over the grammer
Sorta Final is not extensible over the interpretations.
Given that I called it 'Sorta Final', the Final Encoding is probably a
variation on it..
The problem with Sorta Final is that its interpreter is hard coded
into its grammar. We want something with the benefits of sorta final
(an extensible and compositional grammar) but which is polymorphic
over its interpretation.
Typeclasses to the rescue..
-}
class ExpSYM repr where
lit :: Int -> repr
neg :: repr -> repr
add :: repr -> repr -> repr
instance ExpSYM Int where
lit :: Int -> Int
lit n = n
neg :: Int -> Int
neg e = -e
add :: Int -> Int -> Int
add e1 e2 = e1 + e2
instance ExpSYM String where
lit :: Int -> String
lit n = show n
neg :: String -> String
neg e = "(-" ++ e ++ ")"
add :: String -> String -> String
add e1 e2 = "(" ++ e1 ++ " + " ++ e2 ++ ")"
tf1 :: ExpSYM repr => repr
tf1 = add (lit 8) (neg (add (lit 1) (lit 2)))
eval :: Int -> Int
eval = id
view :: String -> String
view = id
{-
These interpreter functions seem strange. How can `id` be an interpetation?
Well the actual interpreter is encoded into the typeclass
instance. `eval` and `view` are simply asserting the type of `repr`.
Think of them as interpretation selectors.
But what about extensibility over the grammer?
-}
class ExpSYM repr => MulSYM repr where
mul :: repr -> repr -> repr
instance MulSYM Int where
mul :: Int -> Int -> Int
mul e1 e2 = e1 * e2
instance MulSYM String where
mul :: String -> String -> String
mul e1 e2 = "(" ++ e1 ++ " * " ++ e2 ++ ")"
tf2 :: MulSYM repr => repr
tf2 = mul (lit 7) tf1
{-
So now we have recovered extensibility over the grammar.
But there are still some open questions:
1. Can we recover Pattern Matching?
2. Can we calculate equality?
3. Can we apply this to a typed Object Language?
The answer to all 3 is yes, but I'm just going to get into pattern matching today.
-}
{-
Lets talk about Compositionality. The Principle of Compositionality states that:
"The meaning of a complex expression is determined by it's structure
and the meanings of its constituents."
This means that in order to undersand the meaning, the value, of some expression:
eval (Add e1 e2) = eval e1 + eval e2
We need only know the meaning of the components `e1` and `e2` which we
determine independently. We don't need to know anything about what is
nested inside `e1` or `e2`. Additionally, the meanings of `e1` and
`e2` are determined in isolation from one another, that is to say
irrespective of their context.
Both the Intial and Final interpreters are compositional.
Oleg states:
"The compositional interpretation of a term is epitomized in a `fold`"
In the Initial encoding, we write an explicit fold function.
In the Final encoding we also use a fold, however the fold is "wired in"
into the the metalanguage expressions that denote our Object Language.
Theres no recusion in our final interpreter, it is rather, an (f-)algebra
from the metalanguage to its semantic domain (the object language).
So we know that the evaluation of a fold is context free and that the act
of pattern matching is inherently a contextual transformation.
Because the fold is wired into the interpreter in the Final Encoding, we
have no way to make contextual transformations. All we can do is fold the
expression and arrive at an interpretation (semantic domain).
But Oleg says we have a means of pattern matching, an interhently
contexual transformation. He achieves this by showing that
"contextual" transformations are "context-free" when interpreting
expressions as functions parameterized by a context.
So thats a mouthful. He gives two specific examples of how this can be
done ad-hoc and then shows an isomorphism between Initial and Final
encoding which allows you to generally apply pattern matching on Final
Encoded terms.
"with `Denotational Semantics` you are describing the meanings of pure,
composable expressions in mathematically pure forms such that the
_meanings_ themselves can be composed together to create meanings for
composite expressions?"
-}
{-
Pushing Down Negation.
The first contextual transformation is factoring out negations:
Our sample expression in the object language:
"(8 + (- (1 + 2)))"
Factoring out the negation would yield:
"(8 + ((-1) + (-2)))"
This is a contextual transformation because when we encounter a
negation we must know the value in its subexpression in order to
push the negation.
In order to do this we want to transform our grammar from:
e := int | neg e | add e e
To the more restricted form:
e := factor | add e e
factor := int | neg int
In this second form `neg` is not recursive and only allows an integer
to be negated once.
-}
-- Initial Encoding
pushNeg :: Exp -> Exp
pushNeg e@(Lit _) = e
pushNeg e@(Neg (Lit _)) = e
pushNeg (Neg (Neg e)) = pushNeg e
pushNeg (Neg (Add e1 e2)) = Add (pushNeg (Neg e1)) (pushNeg (Neg e2))
pushNeg (Add e1 e2) = Add (pushNeg e1) (pushNeg e2)
{-
The nested Pattern Matching is the giveaway that this transformation
must be contextual.
We cannot pattern match per se in the Final Encoding, but we can turn
this contextual transformation into a context free one by treating
expressions as functions parameterized by a context.
-}
data Ctx = Pos' | Neg'
instance ExpSYM repr => ExpSYM (Ctx -> repr) where
lit :: Int -> (Ctx -> repr)
lit n Pos' = lit n -- lit n = \ctx -> if ctx == Pos' then 'lit n else neg (lit n)
lit n Neg' = neg (lit n)
neg :: (Ctx -> repr) -> (Ctx -> repr)
neg e Pos' = e Neg'
neg e Neg' = e Pos'
add :: (Ctx -> repr) -> (Ctx -> repr) -> (Ctx -> repr)
add e1 e2 ctx = add (e1 ctx) (e2 ctx)
{-
In the final encoding, pushNeg is now Structurually Inductive, that
is to say it is clearly a fold with a termination.
-}
pushNegFinal :: (Ctx -> repr) -> repr
pushNegFinal e = e Pos'
-- This contextual transformation is easily extensible as well:
instance MulSYM repr => MulSYM (Ctx -> repr) where
mul e1 e2 Pos' = mul (e1 Pos') (e2 Pos')
mul e1 e2 Neg' = mul (e1 Pos') (e2 Neg')
{-
A more general solution:
We can achieve a more general pattern matching solution by implementing
an isomorphism between Initial and Final Encodings:
-}
instance ExpSYM Exp where
lit = Lit
neg = Neg
add = Add
initialize :: Exp -> Exp
initialize = id
finalize :: ExpSYM repr => Exp -> repr
finalize (Lit n) = lit n
finalize (Neg e) = neg (finalize e)
finalize (Add e1 e2) = add (finalize e1) (finalize e2)
-- Then we compose the iso with our initial encoded interpreter:
pushNeg' = initialize . pushNeg . finalize
-- This allows us to do any sort of contextual transformation on a
-- finally encoded term so long as we are willing to lose
-- extensibility in the processes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment