Quasiquotation for Rust Macros

Following Scheme, we'll provide a handful of operators to programmers to help them easily construct new syntactic forms (ie, output tokenstreams). These forms include:

quote!, for quickly creating new syntactic forms/tokenstreams
unquote, which escapes quote (following , from Scheme's quasiquote form), allowing users to perform general computation inside of a quote! invocation.
unquote-splice, which escapes quasi-quoting and then list-splices the result into the quasi-syntax term.

Syntax

The basic syntax operator allows users to easily create new syntactic forms. For example, consider a macro that tests its input and produces an expression based on its value:

#[macro]
fn derive_typical_maybe_copy(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  let maybe = input.expect_bool()?;
  if maybe {
    quote!(env,#[derive(PartialEq, Eq, Hash, Clone, Copy)])
  } else {
    quote!(env,#[derive(PartialEq, Eq, Hash, Clone, Copy)])
  }
}

Then we can use this to conditionally set definitions for configuration:

derive_typical_maybe_copy!(true)
struct Foo { ... }

This will produce:

#[derive(PartialEq, Eq, Hash, Clone, Copy)]
struct Foo { ... }

In general, quote! has the form:

quote!(env, quotable) => tokenstream

During invocation, the quotable term is (1) converted into a tokenstream, (2) marked with the syntactic (hygiene) information from the macro declaration site (also called recoloring, for short), and (3) returned. Step two is important; we want to ensure that we don't accidentally change the macro outcome because of bindings at the expansion site.

For example, consider the following Scheme snippet:

(define y (lambda (x) x))

(define-syntax (foo x) #'y)

(let ((y (lambda (x) (+ x 1)))) ((foo 'bar) 10)) ;; => evaluates to 10

Here, the y produced by expanding foo yields the identity function; that's the correct, hygienic binding for it. We'd like to preserve this for Rust:

fn y (x : Int) -> Int { x }

#[macro]
fn foo(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  quote!(env, y)
}

fn main() {
  let y = |x| x + 1;
  (foo!())(10);
}

Implementation Comment

If quotable does not contain unquotes, we can simply convert the expression into a tokenstream and perform mark it with its appropriate hygiene information.

Unquote

Unquote allow programmers to easily construct new tokenstreams out of existing tokenstreams and newly-constructed tokenstreams, allowing programmers to do stream analysis and computation in the middle of constructing the output syntax.

We begin by revisiting our previous example, utilizing unquote as $ to reduce the number of quote! invocations:

#[macro]
fn derive_typical_maybe_copy(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  let maybe = input.expect_bool()?;
  quote!(env,#[derive(PartialEq,Eq,Hash,Clone
               $(if maybe { quote!(env, ,Copy) } else { empty_stream })
              ]
        )
}

While this example isn't particularly clearer, it gets at the heart of the $ operator: we can provide and expression that evaluates to a TokenStream inside of $() and, during expansion, this expression will be executed and its result will be placed in the output TokenStream. (We also introduce an additional form here, empty_stream, which provides an empty TokenStream. This is useful for situations such as this, where we'd like to produce empty output in some cases.)

Cond: a larger quote/unquote example

To move onto a larger example, we begin with a partial implementation of the Scheme operator cond, a compact conditional form that works as:

(define fib
  (lambda (n)
    (cond
      ((= n 0) 1)
      ((= n 1) 1)
      (else (+ (fib (- n 1)) (fib (- n 2)))))))

To recreate this in Scheme, we can implement it as follows (using #` as quote! and #, as $):

;; NB There are additional forms, mentioned below, that could simplify
;; this implementation.
#[macro]
(define-syntax (cond x)
  (let ((rhs (cdr (syntax->datum x))))                              ;; (1)
    (if (null? rhs)                                                 ;; (2)
        #`(void)                                                    ;; (3)
        (let ((test (caar rhs))                                     ;; (4)
              (res  (cadar rhs))
              (rest (cdr rhs)))
          #`(if #,(if (eq? 'else test) #`#t (datum->syntax x test)) ;; (4.i)
                  #,(datum->syntax x res)                           ;; (4.ii)
                  #,(if (null? rest)                             
                        #`(void)                                    ;; (4.iii)
                        (datum->syntax x (cons (datum->syntax x 'cond) rest))))))))

The macro algorithm proceeds as follows:

We pull the input syntactic object x into a data object, then destruct it to get at the right-hand side of the cond (eg the clauses).
We check if there are any clauses left.
If not, we return void.
Otherwise, we pull out the first test, the first res, and the rest of the clauses. Then we construct an if statement, where the test and consequent branch reflect the test and res and the alternative branch will handle any remaining clauses as follows:
check if the rest is else, and, if so, emit #t (true in Scheme) as the test; otherwise, emit the test.
emit the result of the test as the true branch of the if
if there are no additional clauses, emit void
if there are, construct an invocation of cond on the rest to recur down them.

To summarize, at each step, we pull out the test and result of the next test clause, then construct a series of nested if expressions that perform the computation (substituting #t for else).

(In the above example, datum->syntax and syntax->datum are tools for converting from syntactic forms into list-like structures and back; we elide them in our rust examples because there the current proposal puns between syntax and datum through TokenStreams, as discussed below.)

To recreate this sort of behavior in the Rust procedural macro system, we'll use quote!, the quote operator, and $, the unquote operator, following a similar algorithm:

Check if cond has no arguments; if so, emit {}
Check if there is a pair of a test and rhs as a prefix of the tokenstream
If there is, construct an if statement: 1. If there is, then check if the test is else.
1. If so, emit true
2. If not, emit the test. 2. Emit the rhs as the true branch of the if test 3. If the remaining input is empty, emit an empty token stream (to form if test {rhs}) 4. If it is not empty, construct a new invocation of cond! as syntax and add it to the tokenstream.
If there is not, panic with an error

Implementing this algorithm using the current proposal proceeds as:

#[macro]
fn cond!(input: TokenStream, env: SyntaxEnv) -> TokenStream {
  if input.is_empty() {                                                 // (1)
    quote!(env, {})
  } else if let Some(((test, rhs), rest)) = input.maybe_pair_prefix() { // (2)
    quote!(env, if $(if test.free_id_eq(env.as_ident("else")) {         // (2.i), (2.i.a)
                       quote!(env, true)                                // (2.i.a.a)
                     } else {
                       env.to_syntax(test, input)}) {                   // (2.i.a.b)
                   $( env.to_syntax(rhs, input) )                       // (2.i.b)
                } $(if rest.is_empty() {                                // (2.i.c)
                      empty_stream
                    } else {                                            // (2.i.d)
                      quote!(env, else { cond!($(rest)) } )
                    }))
  } else {                                                              // (2.ii)
    panic("Invalid conditional form: {:?}", input);
  }
}

This macro, when used, would work as:

fn fib(n: Int) -> Int {
  cond!(
    (n == 0, 1)
    (n == 1, 1)
    (else, fib(n-1) + fib(n-2))
  )
}

This would produce the program:

fn fib(n: Int) -> Int {
  if n == 0 {
      1
  } else {
      if n == 1 {
          1
      } else {
          if true {
              fib(n-1) + fib(n-2)
          }
      }
  }
}

While the result is not idiomatic rust, it (a) achieves the goal and (b) could be transformed into idiomatic rust via a more complex cond! implementation. (Incidentally, the implementation has a secondary advantage: if there is no else clause, the control flow analysis will report that not all paths can produce a value, guiding the programmer to detect bugs by reusing existing compiler infrastructure.)

At this point, we have used a number of additional forms and values in defining cond!. Most notably:

free_id_eq, which checks if two things are identifiers that refer to the same object a la free-identifer=? from Scheme.) We use it to ask if the test-position token is actually the keyword else.
as_ident : String -> TokenTree (?), which converts its input into an uncolored 'ident' token (maybe tokenstream?) (We define another form, below, called as_colored_ident, which also does recoloring)
to_syntax : TokenStream -> &TokenStream -> TokenStream, which takes two tokenstreams (or token slices / token trees, or better any of the three types) and 'colors' the first input using the second input's coloring before returning the first one. We use this twice, recoloring both the test and the rhs to ensure that they will use the bindings from the macro invocation site (which has input's colors) instead of the macro declaration site.)
as_fn_call_syntax : String -> TokenStream -> &TokenStream, which takes an ident, arguments, and a coloring object, and constructs a call as ident(args), colored with the coloring object, as syntax.

These would need to be provided to the programmer as 'quality of life' tools.

Even with all of this, however, we have not explored the really thorny bits of this design:

The TokenStream Pun

At this point in the design, there's a pretty brutal pun between TokenStreams as syntax (colored) objects and TokenStreams as non-syntactic (uncolored) objects; for example, test and rhs are uncolored, but input itself is colored. These two variants of tokenstreams correspond to the structures produced by syntax->datum and datum->syntax in the Scheme implementation above.

The Scheme salve for this problem is to provide two different things: datum, in the form of quoted expressions, and syntax, in the form of syntactic information wrapped around datum. In Rust, this might manifest as TokenStreams, analogous to data, and SyntaxStreams, analogous to syntax forms. It is unclear if this would be a 'big win'; the types may provide annoying to macro writers.

The Threaded-Environment Situation

There is another 'problem' with the code we've seen so far: quote must manually handle its environment. This is in stark contrast to the Scheme code above, which carries its expansion environment as part of the expander, not the macros themselves. Moreover, env is already in scope, and quoting with a different environment than the current one seems... fundamentally incorrect.

It may be possible to completely elide this need by modifying the expander to handle this context itself . This would require quote!(quotable) to be expander-handled form instead of providing it as as a stand-alone entity, and, in addition, it would require that TokenStreams (SyntaxStreams?) know how to recover their colorings so that, eg, to_syntax could work as to_syntax(test, input) without the need for an environment.

While both of these modifications would pull the Rust expander closer to the expanders presented in Syntactic Abstractions in Scheme / Macros That Work Together, they would require serious expander refactoring (and possibly modification) to work.

Unquote Splicing

Finally, we get to unquote splicing, written #,@ in Scheme and $@ in Rust. There are a few different design proposals for this, each attempting to replicate some of the style of unquote-splicing from Scheme.

First, to demonstrate how it works in Scheme (and why it is useful there), we present a few usages of this splicing as Scheme code (eliding # for readability)

(1a)    `(a b ,(reverse '(a b c)))   ;; => (a b (c b a))
(1b)    `(a b ,@(reverse '(a b c)))  ;; => (a b c b a)

(2a)    `(+ ,(1 2))                  ;; => (+ (1 2))
(2b)    `(+ ,@(1 2))                 ;; => (+ 1 2)

This structural tool is of particular utility in Scheme because (e1 e2 ...) applies e1 as a procedure to e2. That is: the parentheses are critically important, and so an operator that allows programmers to more easily control them has immense utility. For example, the recursive cond case from before may be rewritten using syntax splicing as:

(old)    (datum->syntax x (cons (datum->syntax x 'cond) rest))))))))
(new)    #`(cond3 #,@(datum->syntax x rest))

The idea here is that, before, we manually constructed a list of the rest and syntactic cond before converting it inot syntax, and now we can directly construct the list we want. While this operator may be directly translated into the system as we have it so far, parentheses play a subdued role in Rust, and, as a result, the ability to trivially escape them seems to lend less utility than in s-expression languages.

To this end, we present two alternative design proposals for splicing that attempt to find a 'middle ground' utility for providing similar semantic behavior to Scheme's splicing operator (instead of similar syntactic behavior).

NB. If you have another idea for a useful unsplicing-like structure for Rust, please share it!

Alternative Splicing 1: One-Layer Unnesting

One-layer splicing is the idea that a programmer is promised that, during splicing, the current braces/parentheses will be 'extruded outward'. For example:

{a; $@({b; c}); d} ~> {a; b; c; d}

Because {} provide serious meaning in Rust (dealing with Lifetimes), this scope-extrusion utility more closely semantically mirrors the system in Scheme: macro writers in Scheme use splicing to restructure the program to ensure a specific structure around their s-expressions, and this unsplicing will lend itself to a similar approach in Rust.

We may also provide a nicety that un-extrudable things are simply inlined:

{a; $@(b; c); d} ~> {a; b; c; d}

Alternative Splicing 2: Unravel Unnesting

Finally, taking the previous proposal to the extreme, we might imagine unquote splicing as a large 'flattening' hammer:

{a; $@({b; {c; d}}); e} ~> {a; b; c; d; e}

The advantage here is that programmers are ensured full unwrapping, and don't have to worry about any semantic-shifting constructions inside of $@(...)---they will definitely all be removed in the course of expansion. The downside, of course, is that this prevents a programmer from only extruding a specific layer.

Summary

It's unclear which of these three designs would be most useful. Any of them might provide serious utility to the programmer, but it is also imaginable that their particular semantics may make them generally undesirable as a feature. More work should be done to discern several good examples for how they might be used, and if any of then provide 'big wins' to a macro writer.

With-Syntax

TBD

cgswords/quasi.md