Created
July 8, 2018 06:33
-
-
Save CodaFi/07cff3df42318e2de28d5dd9ca387d02 to your computer and use it in GitHub Desktop.
Macro-by-Example: Deriving Syntactic Transformations from their Specifications, Re-Typeset
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//: Foreword | |
//: | |
//: This playground is the culmination of a week of digging around | |
//: for the oldest papers I could find about (Lisp) macros and their evolution. | |
//: I found this paper particularly enlightening as it is the place where Scheme | |
//: and Rust-style "Macro-by-Example" macros first came into their own. In | |
//: addition, this paper discusses the implementation challenges faced by | |
//: previous macro systems and Kohlbecker's first go at MBEs. | |
//: | |
//: I have re-typeset this paper because [the one hosted by IU](ftp://www.cs.indiana.edu/pub/techreports/TR206.pdf) | |
//: is annoying to read. I also translated some of the more interesting | |
//: definitions to Swift to see what would happen. | |
//: # Introduction | |
//: Modern programming languages offer powerful facilities for procedural and | |
//: data abstraction which encapsulate commonly-used patterns of procedure | |
//: invocation and of data usage, respectively. Often, however, one encounters | |
//: typical patterns of linguistic usage that do not fall neatly into either | |
//: category. An example is the let expression, which encapsulates an often-used | |
//: pattern of function creation and call: | |
//: | |
//: (let ((i e) ...) b ...) | |
//: ==> ((lambda (i ...) b ...) e ...) | |
//: | |
//: This is not a procedural abstraction; rather, it is a syntactic abstraction: | |
//: an avstraction of a typical pattern of syntax. | |
//: | |
//: Most modern languages do not provide tools for the creation of syntactic | |
//: abstractions comparable to those available for procedural or data | |
//: abstractions. Even in languages such as Lisp that allow syntactic | |
//: abstractions, the process of defining them is notoriously difficult and | |
//: error-prone. To define let as a macro, we must write a procedure that | |
//: transforms any expression which matches the left-hand pattern into a | |
//: corresponding instance of the right-hand pattern. The code for this looks | |
//: like: | |
//: | |
//: (lambda (s) | |
//: (cons | |
//: (cons ’lambda | |
//: (cons (map car (cadr s)) | |
//: (cddr s))) | |
//: (map cadr (cadr s)))) | |
//: | |
//: This code can hardly be considered transparent. It bears no obvious relation | |
//: to the transformation it engenders. It is difficult to see the shape of the | |
//: structure it is building or the shape of the structure it is trying to take | |
//: apart. Furthermore, it does no error checking on its input. | |
//: | |
//: Modern Lisps supply some tools to help the macro-writer, most notably | |
//: backquote and defmacro (e.g. [Foderaro, Sklower, & Layer 83](http://www.softwarepreservation.org/projects/LISP/franz/Franz_Lisp_July_1983.pdf)). | |
//: Backquote makes the code for building the output look more like the output | |
//: itself, and defmacro includes a pattern-matching facility. Using these | |
//: tools, the let macro might be defined as: | |
//: | |
//: (defmacro let (decls . body) | |
//: ‘((lambda ,(map car decls) . ,body) | |
//: . ,(map cadr decls))) | |
//: | |
//: This is considerably better, but the mapping functions are still mysterious. | |
//: Furthermore, the backquote mechanism is itself error-prone: one always | |
//: leaves out at least one comma on the first try! | |
//: | |
//: In our facility, one defines let as follows: | |
//: | |
//: (declare-syntax let | |
//: [(let ((i e) ...) b ...) | |
//: ((lambda (i ...) b ...) e ...)]) | |
//: | |
//: This is close to the language used for specifying syntactic extensions in | |
//: the revised Scheme report [Steele and Sussman 78]() and the 1986 Scheme | |
//: report [Rees, Clinger, et al. 86], except that it is executable. The | |
//: specification language has the following features: | |
//: | |
//: 1. Pattern-matching, including error-checking, is done on the input. | |
//: 2. The output specification matches the form of the output. No commas or | |
//: other symbols are needed. | |
//: 3. Repetitive elements are specified naturally on both input and output. | |
//: | |
//: This specification mechanism has been used in various versions of Scheme | |
//: since 1982, most recently in Chez Scheme [Dybvig 87](https://www.scheme.com/tspl4/), | |
//: and has proved to be a robust and highly useful feature. Only recently, | |
//: however, did the need for formal documentation of the mechanism lead us to | |
//: develop the formal semantics and the semantically derived compiler presented | |
//: here. | |
//: | |
//: We call this mechanism macro-by-example, or MBE. It has allowed us to embed | |
//: a number of interesting languages in Scheme, such as: | |
//: | |
//: - a type-checked Scheme variant called SPS [Wand 84](https://dl.acm.org/citation.cfm?id=502895) | |
//: - a coroutine mechanism [Friedman, Haynes, & Wand 86](https://www.sciencedirect.com/science/article/pii/009605518690007X) | |
//: - an import/export mechanism for modules [Felleisen and Friedman 86](https://www.sciencedirect.com/science/article/pii/0096055186900159) | |
//: - two quite different semantically-derived subsets of Prolog [Felleisen 85](ftp://ftp.extreme.indiana.edu/pub/techreports/TR182.pdf), [Wand 85] | |
//: | |
//: The presence of this convenient syntactic extension tool provides an | |
//: important design dimension: given any proposed language extension, we get to | |
//: decide how much of it should be treated as procedural abstraction, how much | |
//: as data abstraction, and how much as syntactic abstraction. The presence of | |
//: a good syntactic abstraction mechanism means that a language extension which | |
//: might be highly cumbersome expressed in a purely procedural way can be made | |
//: far more usable by a propitious choice of syntax. (Indeed, the | |
//: `declare-syntax` facility itself may be viewed as such a propitious choice | |
//: of syntax). Conversely, given a complex syntactic abstraction, we can try to | |
//: convert some of it to a procedural or a data abstraction. | |
//: # The Specification Language | |
//: Consider the following example of MBE: | |
//: | |
//: (declare-syntax and | |
//: [(and) true] | |
//: [(and e) e] | |
//: [(and e1 e2 ...) (if e1 (and e2 ...) false)]) | |
//: | |
//: It illustrates some of the important features of MBE: | |
//: | |
//: 1) An MBE specification consists of a series of input-output patterns, which | |
//: are tried in order. If a call matches none of the input patterns, then an | |
//: error is signalled. | |
//: 2) Backquotes and commas in the output patterns are avoided by a simple | |
//: convention: identifiers which appear in the input pattern are treated as | |
//: pattern variables; all other identifiers are constants. No special | |
//: treatment is necessary for lambda or quote in the transcription proper, | |
//: but the result of the transcription can be processed using the | |
//: α-converting expander of [Kohlbecker, et al. 86](ftp://www.cs.indiana.edu/pub/techreports/TR194.pdf) | |
//: to avoid capture of bound variables in macros such as: | |
//: | |
//: | |
//: ``` | |
//: (declare-syntax or2 | |
//: [(or2 x y) (let ((v x)) (if v v y))]) | |
//: ``` | |
//: | |
//: which would otherwise capture any occurrences of v in its second argument. | |
//: (The production version of MBE also allows the user to specify the capture | |
//: of particular variables). | |
//: | |
//: 3) Whenever the ellipsis (indicated by the atom "...") is used, it must be | |
//: preceded by a pattern, and it must be the last element of the pattern | |
//: list or sub-list. (Again, the production version is somewhat more | |
//: generous; we do not illustrate this here). Ellipses may be nested, so | |
//: pattern variables may denote S-expressions, lists of S-expressions, lists | |
//: of lists of S-expressions, etc. | |
//: | |
//: As an illustration of MBE as a language-embedding tool, consider the | |
//: following macros, which are taken from [Wand 85]. These macros define two | |
//: new special forms whose translation produces (through details that are of | |
//: no concern here) an interface to an underlying Prolog semantics. | |
//: | |
//: (declare-syntax clauses | |
//: ((clauses (var1 ...) cl1 ...) | |
//: (lambda (z) | |
//: (let ((var1 (genvar)) ...) | |
//: (call/kf | |
//: (lambda (kf) | |
//: (ff-set-ref! **cut-point** kf) | |
//: (choose (clause cl1) ...))))))) | |
//: | |
//: (declare-syntax clause | |
//: ((clause (pat act1 ...)) | |
//: (begin | |
//: (unify (pattern pat) z) | |
//: (action act1) ...))) | |
//: | |
//: (declare-syntax action | |
//: ((action (fn e1 ...)) | |
//: (fn (list (pattern e1) ...)))) | |
//: | |
//: Without the ability to adapt the syntax, the interface would have been | |
//: impossible to use (indeed, an earlier version of our Prolog semantics | |
//: foundered on precisely this point). With these macros in place, we can write | |
//: Prolog predicates as Scheme procedures in a moderately convenient syntax, | |
//: for example: | |
//: | |
//: (define Append | |
//: (clauses (y a d u v) | |
//: ((nil y y)) | |
//: (((a . d) u (a . v)) | |
//: (Append d u v)))) | |
//: The production version [Kohlbecker 86](https://www2.ccs.neu.edu/racket/pubs/dissertation-kohlbecker.pdf) | |
//: includes a number of additional bells and whistles. First, arbitrary tests | |
//: may be specified by an optional argument called a fender. For example, the | |
//: macro for let should include a test to make sure the bound variables are | |
//: atoms: | |
//: | |
//: (declare-syntax let | |
//: [(let ((i e) ...) b ...) | |
//: (mapand atom? ’(i ...)) | |
//: ((lambda (i ...) b ...) e ...)]) | |
//: | |
//: Second, arbitrary processing of the macro call may be performed by a | |
//: with-specification, which binds its variable to the result of a Scheme | |
//: calculation. | |
//: | |
//: (declare-syntax new-lang | |
//: [(new-lang e ...) | |
//: (with ([code (translate-to-scheme ’(e ...))]) | |
//: (top-level code))]) | |
//: And third, the macro-writer may also specify keywords to serve as pattern | |
//: constants in the input patterns (e.g., else) and as terminators for | |
//: ellipses. | |
//: # Semantics | |
//: In this section, we sketch the formal semantics of the MBE mechanism. We | |
//: restrict ourselves to a single input-output pair, emphasizing the correct | |
//: treatment of ellipses. | |
//: | |
//: In our description, we employ relatively standard notation. We use three | |
//: closely-spaced dots `(...)` for the ellipsis symbol and center dots | |
//: `(· · ·)` for ellipsis in the meta-language we use `(x)1, (x)2` for | |
//: projection, `( α ⇒ β , γ)` for conditionals, and `A −o-> B` for the set of | |
//: partial functions from `A` to `B`. | |
//: We define the following domains: | |
//: `S-exp ::= Ident | (S-exp ... S-exp)` | |
enum SExpr { | |
case atom(String) | |
case many([SExpr]) | |
/// Returns true if this expression is the empty list `()`. | |
var isNull: Bool { | |
switch self { | |
case let .many(xs) where xs.isEmpty: | |
return true | |
default: | |
return false | |
} | |
} | |
/// Returns the left and right elements of this expression if it is a pair - | |
/// not an atom and not empty. Returns `nil` otherwise. | |
var asPair: (SExpr, SExpr)? { | |
switch self { | |
case let .many(xs) where !xs.isEmpty: | |
return (xs[0], .many(Array(xs.dropFirst()))) | |
default: | |
return nil | |
} | |
} | |
/// Returns the elements of this expression if it is a list. Returns `nil` | |
/// otherwise. | |
var asList: [SExpr]? { | |
switch self { | |
case let .many(xs): | |
return xs | |
default: | |
return nil | |
} | |
} | |
static func cons(_ l: SExpr, _ r: SExpr) -> SExpr { | |
guard let xs = r.asList else { | |
fatalError() | |
} | |
return .many([l] + xs) | |
} | |
var hd: SExpr? { | |
switch self { | |
case let .many(xs): | |
return xs.first.map { .many([$0]) } | |
default: | |
return nil | |
} | |
} | |
var tl: SExpr? { | |
switch self { | |
case let .many(xs): | |
return .many(Array(xs.dropFirst())) | |
default: | |
return nil | |
} | |
} | |
var classify: SType { | |
switch self { | |
case .atom: | |
return .ground(self) | |
case let .many(xs): | |
return .many(xs.map{$0.classify}) | |
} | |
} | |
} | |
//: `S# ::= S-exp | (S# ... S#)` | |
enum SType { | |
case ground(SExpr) | |
case many([SType]) | |
var length: Int { | |
switch self { | |
case .ground(_): return 0 | |
case let .many(xs): return xs.count | |
} | |
} | |
var isNull: Bool { | |
switch self { | |
case .ground(_): return true | |
default: return false | |
} | |
} | |
var hd: SType? { | |
switch self { | |
case .ground(_): return nil | |
case .many(let tys): return tys.first.map { .many([$0]) } | |
} | |
} | |
var tl: SType? { | |
switch self { | |
case .ground(_): return nil | |
case .many(let tys): return .many(Array(tys.dropFirst())) | |
} | |
} | |
} | |
//: `Pat ::= () | Ident | (Pat . Pat) | (Pat ...)` | |
enum Pattern { | |
case empty | |
case identifier(String) | |
indirect case pair(Pattern, Pattern) | |
indirect case ellipses(Pattern) | |
/// Returns a list of identifiers contained in a pattern. | |
var freeVariables: Set<String> { | |
switch self { | |
case .empty: return [] | |
case let .identifier(a): return [a] | |
case let .pair(p1, p2): return p1.freeVariables.union(p2.freeVariables) | |
case let .ellipses(ps): | |
return ps.freeVariables | |
} | |
} | |
} | |
//: `Env = Ident -0-> (Int x S#)` | |
typealias Environment = [String: (Int, SType)] | |
///: Takes an input pattern and an S-expression and returns a boolean indicating | |
///: whether the S-expression matches the pattern. | |
///: | |
///: The equations for `D` are guaranteed to be sensible only on the condition | |
///: that `B` returns true. | |
func B(_ p: Pattern, _ s: SExpr) -> Bool { | |
switch p { | |
case .empty: | |
return s.isNull | |
case .identifier(_): | |
return true | |
case let .pair(p1, p2): | |
guard let (hd, tl) = s.asPair else { | |
return false | |
} | |
return B(p1, hd) && B(p2, tl) | |
case let .ellipses(ps): | |
guard let ss = s.asList else { | |
return false | |
} | |
return ss.allSatisfy({ t in B(ps, t) }) | |
} | |
} | |
//: Takes an input pattern and an S-expression and (if it matches the pattern) | |
//: returns an environment associating pattern variables with their result | |
//: bindings. | |
//: | |
//: Each binding is a pair consisting of a non-negative integer and an element | |
//: of S#. The integer indicates the variable's level, the number of ellipses | |
//: enclosing it. | |
func D(_ p: Pattern, _ s: SExpr) -> Environment? { | |
func D(at level: Int, _ p: Pattern, _ s: SExpr) -> Environment? { | |
switch p { | |
case .empty: | |
return [:] | |
case let .identifier(a): | |
return [a:(level, .ground(s))] | |
case let .pair(p1, p2): | |
guard let (hd, tl) = s.asPair else { | |
return nil | |
} | |
guard let hdEnv = D(at: level, p1, hd) else { | |
return nil | |
} | |
guard let tlEnv = D(at: level, p2, tl) else { | |
return nil | |
} | |
return hdEnv.merging(tlEnv, uniquingKeysWith: {$1}) | |
case let .ellipses(ps): | |
guard let ss = s.asList else { | |
return nil | |
} | |
var dict = Environment() | |
for s in ss { | |
guard let env = D(at: level + 1, ps, s) else { | |
return nil | |
} | |
dict.merge(env, uniquingKeysWith: {$1}) | |
} | |
return dict | |
} | |
} | |
return D(at: 1, p, s) | |
} | |
//: Takes an output pattern and an environment and expands the pattern in that | |
//: environment. | |
func T(_ p: Pattern, _ env: Environment) -> SExpr? { | |
switch p { | |
case .empty: | |
return .many([]) | |
case let .identifier(a): | |
// If the output pattern is an identifier, we first check to see if it is | |
// bound in the environment. | |
guard let (lev, tm) = env[a] else { | |
// If it is not bound in the environment, then it is treated as a | |
// constant. | |
return .atom(a) | |
} | |
// If it is, and its level is zero, then the result is its value. | |
guard lev == 0 else { | |
return .atom(a) | |
} | |
// If the level is non-zero, then it was bound to a list of values, not a | |
// value, so it may not be transcribed. | |
return nil | |
case let .pair(p1, p2): | |
// If the output pattern is a pair (p1 . p2), its head and tail are each | |
// processed separately. The two results are then joined. | |
guard let x = T(p1, env), let xs = T(p2, env) else { | |
return nil | |
} | |
return SExpr.cons(x, xs) | |
case let .ellipses(ps): | |
// In processing an output pattern (p ...) with T, we | |
// have the inverse of the problem with D: we must split a | |
// single environment into a list of environments. To do | |
// this, we first restrict the environment to the free | |
// variables of p. We then check whether the variables of | |
// p include at least one variable of level greater than 0. If | |
// not, the prototype p is rejected, since there is no way | |
// to determine the length of the repetition. If the | |
// prototype passes, then we decompose the environment into a | |
// list of environments and map T[p] over this list. The | |
// decomposition is complicated by three considerations: | |
// | |
// 1) We must decrement the level of each of the non-scalar | |
// variables in the environment. | |
// 2) We copy scalar (level 0) variables across the list, being | |
// sure *not* to decrement their level, so that code like | |
// | |
// (declare-syntax copy-it | |
// [(copy-it i j ...) (bar '(i j) ...)]) | |
// | |
// correctly transcribes | |
// | |
// (copy-it a 1 2 3) | |
// | |
// into | |
// | |
// (bar '(a 1) '(a 2) '(a 3)) | |
// | |
// 3) We determine the length of the list. This is the | |
// length of the lists in the environment if they are all | |
// the same length. If they are not the same length, an | |
// error is signalled. | |
func controllable(_ env: Environment, _ rhoRestrict: Environment) -> Bool { | |
return rhoRestrict.keys.firstIndex(where: { (k) -> Bool in | |
guard let val = rhoRestrict[k] else { return false } | |
return val.0 > 0 | |
}) != nil | |
} | |
let fv = p.freeVariables | |
let rhoRestrict = env.filter { (t) -> Bool in | |
return fv.contains(t.key) | |
} | |
guard controllable(env, rhoRestrict) else { | |
return nil | |
} | |
// Decompose the environment into a list of environments | |
func decompose(_ rho: Environment) -> [Environment] { | |
guard let firstLen = rho.first?.value.1.length else { return [] } | |
// UnequalLengths? | |
guard rho.allSatisfy({ (t) -> Bool in t.value.1.length == firstLen }) else { | |
fatalError("Type error!") | |
} | |
// StopNow? | |
guard firstLen > 0 else { | |
return [] | |
} | |
let splitHeads = rho.mapValues({ (t) -> (Int, SType) in | |
guard t.0 == 0 else { | |
return (t.0 - 1, t.1.hd!) | |
} | |
return (t.0, t.1) | |
}) | |
let splitTails = rho.mapValues({ (t) -> (Int, SType) in | |
guard t.0 == 0 else { | |
return (t.0 - 1, t.1.tl!) | |
} | |
return (t.0, t.1) | |
}) | |
return [splitHeads] + decompose(splitTails) | |
} | |
let ds = decompose(rhoRestrict) | |
var exprs = [SExpr]() | |
for d in ds { | |
guard let t = T(ps, d) else { | |
return nil | |
} | |
exprs.append(t) | |
} | |
return .many(exprs) | |
} | |
} | |
//: A transform function for one input-output pattern pair. | |
func E(_ ps: (lhs: Pattern, rhs: Pattern), _ s: SExpr) -> SExpr? { | |
guard B(ps.lhs, s) else { | |
return nil | |
} | |
guard let env = D(ps.lhs, s) else { | |
return nil | |
} | |
return T(ps.rhs, env) | |
} | |
//: # Deriving The Compiler | |
//: The original versions of MBE (along with the current production version, | |
//: detailed in [Kohlbecker 86]((https://www2.ccs.neu.edu/racket/pubs/dissertation-kohlbecker.pdf))), used ad hoc code generation methods to | |
//: compile a specification into Scheme code. The code generator is quite | |
//: clever; for example, for let it produces the code shown in the first | |
//: display. The evolution of this code generator has been tortuous, and | |
//: therefore MBE seemed to be a good real-world example on which to exercise | |
//: semantic methods. | |
//: | |
//: The first step in the derivation was the production of the formal semantics | |
//: sketched above. From there, the derivation proceeded in two phases: staging | |
//: [Jorring and Sherlis 86] and representation [Wand 82]. | |
//: | |
//: The goals of staging are to identify opportunities for early binding and to | |
//: reorder and re-curry the arguments to the various functions so that the | |
//: early-bindable arguments appear first. | |
//: | |
//: Our particular goal is to avoid building intermediate structures such as | |
//: environments whenever possible. Recall that `D` has functionality | |
//: | |
//: // Pat -> S-exp -o-> (Ident -o-> (Int × S#)) | |
//: func D(_ p: Pattern, _ s: SExpr) -> Environment? | |
//: | |
//: We delay the binding of S-exp by replacing Env by | |
//: | |
//: Env′ = (Ident -o-> Int) x (Ident -o-> S-exp -> S#) | |
//: | |
//: *Editor's note: Env′ as defined here does not typecheck as used in the rest | |
//: of the paper. The authors probably meant the following type instead:* | |
//: | |
//: Env′ = (Ident -o-> Int) x (Ident -o-> S-exp -> S-exp) | |
typealias Env2 = ([String: Int], [String: (SExpr) -> SExpr]) | |
//: and replacing `D` by a function `D′` of functionality `Pat -> Env′`. | |
//: | |
//: An element of `Env′` may be thought of as a symbol table. Instead of having | |
//: the function `D` which takes an S-expression and builds an environment | |
//: containing S-expressions, we have a new function which takes a pattern and | |
//: builds a symbol table containing functions from S-expressions to | |
//: S-expressions. We call these functions selectors. A selector is later | |
//: applied to the subject S-expression to extract the values for the variables. | |
//: Thus the condition we want is that if `B[p]s` is true, then for all `i`, | |
//: | |
//: ⟨(D′[p])1i, (D′[p])2is⟩ = D[p]si | |
//: | |
//: This formulation also makes clear that the levels of the variables are | |
//: independent of the subject S-expression to which the pattern is matched. | |
//: (This is analogous to static chain analysis or type analysis). | |
//: | |
//: It is easy to build `D′` as the product of two functions, which we call | |
//: `D1′` and `D2′` . Each of these is obtained by modifying the appropriate | |
//: piece of the definition of `D`, using the congruence condition above as a | |
//: guide. Doing this, we get: | |
//: | |
func D1(_ p: Pattern) -> [String: Int] { | |
func D1(at level: Int, _ p: Pattern) -> [String: Int] { | |
switch p { | |
case .empty: | |
return [:] | |
case let .identifier(a): | |
return [a:level] | |
case let .pair(p1, p2): | |
let hdEnv = D1(at: level, p1) | |
let tlEnv = D1(at: level, p2) | |
return hdEnv.merging(tlEnv, uniquingKeysWith: {$1}) | |
case let .ellipses(ps): | |
return D1(at: level + 1, ps) | |
} | |
} | |
return D1(at: 1, p) | |
} | |
func D2(_ p: Pattern) -> [String: (SExpr) -> SExpr] { | |
switch p { | |
case .empty: | |
return [:] | |
case let .identifier(a): | |
return [a:{ $0 }] | |
case let .pair(p1, p2): | |
let hdEnv = D2(p1) | |
let tlEnv = D2(p2) | |
let head = hdEnv.mapValues { (f) -> (SExpr) -> SExpr in | |
return { s in f(s.hd!) } | |
} | |
let tail = tlEnv.mapValues { (f) -> (SExpr) -> SExpr in | |
return { s in f(s.tl!) } | |
} | |
return head.merging(tail, uniquingKeysWith: {$1}) | |
case let .ellipses(p): | |
return D2(p).mapValues { (f) -> (SExpr) -> SExpr in | |
return { s in f(s.hd!) } | |
} | |
} | |
} | |
//: Now we can proceed to modify `T` to work with new-style environments. Just | |
//: as the analysis of `D` produced selectors, the analysis of `T` produces | |
//: constructors. The new version of `T` will be | |
//: | |
//: T′ : Pat -> Env′ -> S-exp -o-> S-exp | |
//: | |
//: Using the condition on `D′` above to illustrate how to translate new | |
//: environments into old environments, we see that the condition we want | |
//: `T′` to obey is | |
//: | |
//: func T2(_ p: Pattern, _ e: Env2, _ s: SExpr) -> SExpr? { | |
//: var loweredEnv = Environment() | |
//: for k in e.0.keys { | |
//: loweredEnv[k] = (e.0[k]!, e.1[k]!(s)) | |
//: } | |
//: return T(p, loweredEnv) | |
//: } | |
//: | |
//: With this condition, we can redefine E as | |
func E2(_ ps: (lhs: Pattern, rhs: Pattern), _ s: SExpr) -> SExpr? { | |
guard B(ps.lhs, s) else { | |
return nil | |
} | |
return T2(ps.rhs, (D1(ps.lhs), D2(ps.lhs)), s) | |
} | |
//: A simple calculation shows that this definition is equivalent to the old | |
//: one. | |
//: | |
//: As usual, it is easy to write the first three clauses for T ′: | |
func T2(_ p: Pattern, _ e: Env2, _ s: SExpr) -> SExpr? { | |
switch p { | |
case .empty: | |
return .many([]) | |
case let .identifier(a): | |
guard let lev = e.0[a] else { | |
return .atom(a) | |
} | |
guard lev == 0 else { | |
fatalError() | |
} | |
guard let f = e.1[a] else { | |
return nil | |
} | |
return f(s) | |
case let .pair(p1, p2): | |
guard let x = T2(p1, e, s), let xs = T2(p2, e, s) else { | |
return nil | |
} | |
return SExpr.cons(x, xs) | |
case .ellipses(_): | |
fatalError("Deferred to next section") | |
} | |
} | |
//: The analysis of `T′[(p ...)]` is harder; we defer it to the next section. | |
//: | |
//: The staging analysis shows that at compile-time we can produce a set of | |
//: tests, selectors, and constructors that can be applied at macro-expansion | |
//: time to an S-expression. In the second phase of the derivation, we choose | |
//: appropriate representations for these functions. In keeping with slogan | |
//: "target code as a representation of semantics" [Wand 82], we represent these | |
//: functions in Scheme, our target language. | |
//: | |
//: We begin with the representation for selectors. To determine the | |
//: representation, we look at `D2′` to see what are the basic selectors and | |
//: what are the constructors which build selectors. This gives us a | |
//: mini-language of selectors. We then choose representations for this | |
//: language in Scheme. In particular, we represent these functions as Scheme | |
//: expressions with a single free variable `s`. Then the function can be | |
//: applied simply by evaluating the expression in a suitable environment. | |
//: | |
//: Looking at the definition of `D2′` , we can see that every selector is | |
//: either `λ s . s` or of one of the forms `λ s . (M (hd s))` , | |
//: `λ s . (M (tl s))` and `λ s . (map M s)`, where `M` is another selector. | |
//: Thus we can represent these by the Scheme expressions `s`, `M [(car s)/s]`, | |
//: `M[(cdr s)/s]`, and `(map (lambda (s) M) s)`, respectively, where `M[N/v]` | |
//: means the substitution of `N` for the free occurrences of `v` in `M`. We can | |
//: then do a small amount of peephole optimization on the representations | |
//: (e.g., replacing `(lambda (s) (map (lambda (s) s) s))` by `s`). This is easy | |
//: since the language of selectors is small. We can analyze tests and | |
//: constructors similarly, by looking at the definitions of B and T ′. | |
//: | |
//: Since the same representation is used for all these functions, they can be | |
//: inter-mixed when necessary. For example, in `T′` there is no separate | |
//: representation for `(ρ′)2a`, since this is a selector whose representation | |
//: is already determined. | |
//: | |
//: Having decided on a representation, we then go back and modify the functions | |
//: `B`, `D2′` and `T′` to produce these concrete representations rather than | |
//: the functions. This gives us a compiler: a function which takes a | |
//: pattern-transcription pair and produces a piece of Scheme code which | |
//: performs the transliteration. | |
//: | |
//: More precisely, we replace E by: | |
//: | |
//: *Editor's note: In Scheme, there is a difference here. In Swift, there | |
//: difference is unimportant enough that there's no sense redefining | |
//: everything because we can't actually 'compile' anything* | |
func E3(_ ps: (lhs: Pattern, rhs: Pattern), _ s: SExpr) -> SExpr? { | |
return E2(ps, s) | |
} | |
//: Note that all this code is obtained by modifying the output of the previous | |
//: functions, not by doing any *ab initio* coding or analysis. In Scheme, this | |
//: modification may be accomplished by simply introducing backquotes in the | |
//: appropriate places. | |
//: | |
//: This completes the compiler, except for the transcription of ellipses. | |
//: *Editor's note: From here on out, nothing else is transliterated. The | |
//: proper way to do this requires adding expression quoting to Swift and | |
//: that's too much work for a Saturday evening.* | |
//: # Transcribing Ellipses | |
//: Transcribing ellipses is a harder problem for this compilation strategy, | |
//: since in general there is no way to avoid building intermediate structures | |
//: akin to environments. | |
//: | |
//: Our strategy is to deal first with the most common special case, which | |
//: luckily does not require any intermediate structures. It is easy to confirm | |
//: that the following clause satisfies the congruence relation (∗), given in | |
//: the previous section, which must hold between `T` and `T′`: | |
//: | |
//: T′[(a ...)]ρ′ = ((ρ′)1a = 1) ⇒ (ρ′)2a, error | |
//: | |
//: This takes care of what seems to be the most common use of ellipses. In | |
//: general, however, we need to attack the congruence relation for | |
//: `T′[(p ...)]` directly. The congruence relation requires that | |
//: | |
//: T ′[(p ...)]ρ′s | |
//: = T [(p ...)](λi.⟨(ρ′)1i, (ρ′)2is⟩) | |
//: = map (T [p]) | |
//: (decompose (λi ∈ fv(p).⟨(ρ′)1i, (ρ′)2is⟩)) | |
//: | |
//: Let us write `ρ` for the argument to decompose above. We need to build some | |
//: representation of `ρ` and of decompose `ρ`. That representation should allow | |
//: us to build as much as possible of the environment before knowing the value | |
//: of `s`. Once that is done, we can attack the problem of replacing the `T` in | |
//: the mapping expression by `T′`. | |
//: | |
//: Let `i∗ = [i0,···,ik]` be an enumeration of the free variables of `p` in a | |
//: list. (We use the square brackets `[···]` or `[··· l ··· | l ∈ L]` to denote | |
//: lists). We represent all functions whose domain is `fv(p)` as lists, indexed | |
//: by `i∗`. Thus a function `f` can be represented by the list | |
//: `[f(i0),···,f(ik)]`. | |
//: | |
//: Since an element of `Env′` is a pair of functions, we represent it as a pair | |
//: of lists. We refer to these two lists as the level component and the value | |
//: component of the environment. Hence `decompose ρ` becomes | |
//: | |
//: decompose [(ρ′)1i | i ∈ i∗] [(ρ′)2is | i ∈ i∗] | |
//: | |
//: The output from `decompose` is a list of environments, each of which has the | |
//: same level component. Hence we can represent this list of environments as a | |
//: single level component and a list of value components. Thus we can represent | |
//: `decompose (λ i ∈ fv(p) . ⟨(ρ′)1i, (ρ′)2is⟩)` as | |
//: | |
//: ⟨decompose-levels n∗, decompose-values n∗ s∗⟩ | |
//: n∗ =[(ρ′)1i|i∈i∗] | |
//: s∗ = [(ρ′)2is | i ∈ i∗] | |
//: decompose-levels = λn∗.[((n = 0) ⇒ 0, n − 1) | n ∈ n∗] | |
//: decompose-values | |
//: = λ n∗ s∗ . UnequalLengths? s∗ ⇒ error, | |
//: StopNow?ρ ⇒ (), | |
//: cons (split∗ hd n∗ s∗) | |
//: decompose-valuesn∗ (split∗ tl n∗ s∗) | |
//: split∗ = λfn∗s∗.[(n = 0) ⇒ s, fs | (n, s) ∈ (n∗, s∗)] | |
//: | |
//: Here the definition of split∗ describes mapping over two lists of equal | |
//: length. | |
//: | |
//: By these manipulations, we have given ourselves the possibility of | |
//: precomputing the levels. We take advantage of this by currying the | |
//: functionality of `T′`, changing it from | |
//: | |
//: T′ : Pat -> ((Ident −o-> Int) × (Ident −o-> S-exp -> S♯)) -> S-exp −o-> S-exp | |
//: | |
//: to | |
//: | |
//: T′ : Pat -> (Ident -o-> Int) -> (Ident -o-> S-exp -> S#) -> S-exp −o-> S-exp | |
//: | |
//: We may also make a similar change in the functionality of `T` to reflect our | |
//: new representation of environments: | |
//: | |
//: T : Pat -> (Ident -o-> Int) -> (Ident -o-> S#) -> S-exp | |
//: | |
//: With these changes, the congruence condition between T and T ′ may be | |
//: restated as: | |
//: | |
//: T′[p]ρ1ρ2s = T[p]ρ1(λi . ρ2is) | |
//: | |
//: Let ρ∗ be the selector environment `D2′[i∗]`, that is, the environment | |
//: `{(i0 |-> λ s . (hd s)), (i1 |-> λs.(hd (tl s))), · · ·}`. Hence, if we have | |
//: an environment represented by the pair `(n∗ , s∗)`, we can use `ρ∗` with | |
//: `T′` to retrieve elements from `s∗` . More precisely, we may deduce from the | |
//: congruence condition that for any pattern `p′`, we have | |
//: | |
//: T′[p′]n∗ρ∗s∗ = T[p′]n∗s∗ | |
//: | |
//: where the first s∗ denotes a list which is the subject of the transcription | |
//: and the second denotes the same list in its role as the representation of a | |
//: value environment. | |
//: | |
//: We can use this identity to replace the `T` in the definition of | |
//: `T′[(p ...)]` by `T′`. For clarity, we assume for the moment that the | |
//: pattern `p` is controllable in level-environment `ρ1`. We may now derive the | |
//: new definition: | |
//: | |
//: T ′[(p ...)]ρ1ρ2s | |
//: = T [(p ...)]ρ1(λi.ρ2is) | |
//: = map (T [p])(decompose (λ i ∈ fv(p) . ⟨ρ1i, ρ2is⟩)) | |
//: = let i∗ =fv(p); n∗ =[ρ1 i | i ∈ i∗]; | |
//: m∗ = decompose-levels n∗ in | |
//: λ s . map (T [p]m∗)(decompose-values n∗ [ρ2is | i ∈ i∗]) | |
//: = let i∗ = fv(p); n∗ = [ρ1i | i ∈ i∗]; | |
//: m∗ = decompose-levels n∗; ρ∗ = D2′ [i∗] in | |
//: λ s . map (T ′[p]m∗ρ∗) | |
//: (decompose-values n∗ [ρ2is | i ∈ i∗]) | |
//: | |
//: This version of the definition allows us to write `T′` as a pure structural | |
//: recursion, so that we can transcribe it into a concrete representation as | |
//: well: | |
//: | |
//: T′′[(p ...)]ρ1ρ2 | |
//: = let i∗ = fv(p); n∗ = [ρ1i|i∈i∗]; | |
//: m∗ = decompose-levels n∗; ρ∗ = D2′ [i∗] in | |
//: (map (lambda (s) T ′[p]m∗ρ∗) | |
//: (decompose-values (quote n∗) | |
//: (list ρ2i0 · · · ρ2ik))) | |
//: | |
//: For the last line, recall that `ρ2` contains concrete selectors, which, | |
//: when evaluated with `s` bound to `s`, will return the value of `ρ2` is. | |
//: Hence, if `i∗ = [i0, · · · , ik]`, then the list expression will evaluate | |
//: correctly to `[ρ2is | i ∈ i∗]`. | |
//: | |
//: This version of the target code includes an explicit call to | |
//: `decompose-values`. It is possible to apply the same methods to | |
//: `decompose-values`, using staging to take advantage of the fact that its | |
//: first argument is known. The resulting system generates target code that | |
//: includes a local letrec loop in place of the `decompose-values`, and in | |
//: which the run-time level tests are eliminated. We leave this development as | |
//: an exercise for the reader. | |
//: # Results | |
//: The derivation, including false starts and debugging of the resulting code, | |
//: took well under one man-week. The production compiler, by contrast, embodies | |
//: several man-months of work. | |
//: | |
//: How good is the resulting compiler? It seems to produce code which is | |
//: comparable with the production version. For let, it produces the following | |
//: code for the transcription: | |
//: | |
//: | |
//: (lambda (s) | |
//: (cons | |
//: (cons ’lambda | |
//: (cons (map (lambda (s) (car s)) | |
//: (car (cdr s))) | |
//: (cdr (cdr s)))) | |
//: (map (lambda (s) (car (cdr s))) | |
//: (car (cdr s))))) | |
//: | |
//: which is clearly equivalent (given a reasonably optimizing compiler) to the | |
//: production version code given above. It also produces comparable code for | |
//: the tests. In view of this performance, we regard the derivation as a | |
//: success. | |
//: # Conclusions | |
//: We have presented a "macro-by-example" facility for Lisp-like languages. The | |
//: facility allows the user to specify syntactic extensions in a natural, | |
//: non-procedural manner. These specifications are expanded into | |
//: transformations which automatically perform pattern-matching, | |
//: error-checking, and mapping. We have given a formal semantics for the | |
//: specification language and have shown how the semantics can be converted | |
//: into a compiler by the use of staging and suitable choices of | |
//: representations for the semantic functions. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment