Following Scheme, we'll provide a handful of operators to programmers to help them
easily construct new syntactic forms (ie, output tokenstream
s). These forms include:
quote!
, for quickly creating new syntactic forms/tokenstreamsunquote
, which escapes quote (following,
from Scheme's quasiquote form), allowing users to perform general computation inside of aquote!
invocation.unquote-splice
, which escapes quasi-quoting and then list-splices the result into the quasi-syntax term.
The basic syntax operator allows users to easily create new syntactic forms. For example, consider a macro that tests its input and produces an expression based on its value:
#[macro]
fn derive_typical_maybe_copy(input: TokenStream, env: SyntaxEnv) -> TokenStream {
let maybe = input.expect_bool()?;
if maybe {
quote!(env,#[derive(PartialEq, Eq, Hash, Clone, Copy)])
} else {
quote!(env,#[derive(PartialEq, Eq, Hash, Clone, Copy)])
}
}
Then we can use this to conditionally set definitions for configuration:
derive_typical_maybe_copy!(true)
struct Foo { ... }
This will produce:
#[derive(PartialEq, Eq, Hash, Clone, Copy)]
struct Foo { ... }
In general, quote!
has the form:
quote!(env, quotable) => tokenstream
During invocation, the quotable
term is (1) converted into a tokenstream, (2) marked with the
syntactic (hygiene) information from the macro declaration site (also called recoloring, for short),
and (3) returned. Step two is important; we want to ensure that we don't accidentally change the
macro outcome because of bindings at the expansion site.
For example, consider the following Scheme snippet:
(define y (lambda (x) x))
(define-syntax (foo x) #'y)
(let ((y (lambda (x) (+ x 1)))) ((foo 'bar) 10)) ;; => evaluates to 10
Here, the y
produced by expanding foo
yields the identity function; that's the correct,
hygienic binding for it. We'd like to preserve this for Rust:
fn y (x : Int) -> Int { x }
#[macro]
fn foo(input: TokenStream, env: SyntaxEnv) -> TokenStream {
quote!(env, y)
}
fn main() {
let y = |x| x + 1;
(foo!())(10);
}
If quotable
does not contain unquotes, we can simply convert the expression into a
tokenstream and perform mark it with its appropriate hygiene information.
Unquote allow programmers to easily construct new tokenstreams out of existing tokenstreams and newly-constructed tokenstreams, allowing programmers to do stream analysis and computation in the middle of constructing the output syntax.
We begin by revisiting our previous example, utilizing unquote as $
to reduce
the number of quote!
invocations:
#[macro]
fn derive_typical_maybe_copy(input: TokenStream, env: SyntaxEnv) -> TokenStream {
let maybe = input.expect_bool()?;
quote!(env,#[derive(PartialEq,Eq,Hash,Clone
$(if maybe { quote!(env, ,Copy) } else { empty_stream })
]
)
}
While this example isn't particularly clearer, it gets at the heart of the $
operator:
we can provide and expression that evaluates to a TokenStream
inside of $()
and,
during expansion, this expression will be executed and its result will be placed in the
output TokenStream. (We also introduce an additional form here, empty_stream
, which provides
an empty TokenStream
. This is useful for situations such as this, where we'd like to produce
empty output in some cases.)
To move onto a larger example, we begin with a partial implementation of the Scheme
operator cond
, a compact conditional form that works as:
(define fib
(lambda (n)
(cond
((= n 0) 1)
((= n 1) 1)
(else (+ (fib (- n 1)) (fib (- n 2)))))))
To recreate this in Scheme, we can implement it as follows (using #`
as quote!
and
#,
as $
):
;; NB There are additional forms, mentioned below, that could simplify
;; this implementation.
#[macro]
(define-syntax (cond x)
(let ((rhs (cdr (syntax->datum x)))) ;; (1)
(if (null? rhs) ;; (2)
#`(void) ;; (3)
(let ((test (caar rhs)) ;; (4)
(res (cadar rhs))
(rest (cdr rhs)))
#`(if #,(if (eq? 'else test) #`#t (datum->syntax x test)) ;; (4.i)
#,(datum->syntax x res) ;; (4.ii)
#,(if (null? rest)
#`(void) ;; (4.iii)
(datum->syntax x (cons (datum->syntax x 'cond) rest))))))))
The macro algorithm proceeds as follows:
- We pull the input syntactic object
x
into a data object, then destruct it to get at the right-hand side of thecond
(eg the clauses). - We check if there are any clauses left.
- If not, we return
void
. - Otherwise, we pull out the first
test
, the firstres
, and therest
of the clauses. Then we construct anif
statement, where the test and consequent branch reflect thetest
andres
and the alternative branch will handle any remaining clauses as follows: - check if the rest is
else
, and, if so, emit#t
(true
in Scheme) as the test; otherwise, emit the test. - emit the result of the test as the true branch of the
if
- if there are no additional clauses, emit
void
- if there are, construct an invocation of
cond
on therest
to recur down them.
To summarize, at each step, we pull out the test and result of the next test clause, then
construct a series of nested if expressions that perform the computation (substituting
#t
for else
).
(In the above example, datum->syntax
and syntax->datum
are tools for converting from
syntactic forms into list-like structures and back; we elide them in our rust examples
because there the current proposal puns between syntax and datum through TokenStreams,
as discussed below.)
To recreate this sort of behavior in the Rust procedural macro system, we'll
use quote!
, the quote operator, and $
, the unquote operator, following a similar
algorithm:
To recreate this sort of behavior in the Rust procedural macro system, we'll
use quote!
, the quote operator, and $
, the unquote operator, following a similar
algorithm:
- Check if
cond
has no arguments; if so, emit{}
- Check if there is a
pair
of a test andrhs
as a prefix of the tokenstream - If there is, construct an
if
statement: 1. If there is, then check if thetest
iselse
.- If so, emit
true
- If not, emit the test.
2. Emit the
rhs
as the true branch of theif
test 3. If the remaining input is empty, emit an empty token stream (to formif test {rhs}
) 4. If it is not empty, construct a new invocation ofcond!
as syntax and add it to the tokenstream.
- If so, emit
- If there is not, panic with an error
Implementing this algorithm using the current proposal proceeds as:
#[macro]
fn cond!(input: TokenStream, env: SyntaxEnv) -> TokenStream {
if input.is_empty() { // (1)
quote!(env, {})
} else if let Some(((test, rhs), rest)) = input.maybe_pair_prefix() { // (2)
quote!(env, if $(if test.free_id_eq(env.as_ident("else")) { // (2.i), (2.i.a)
quote!(env, true) // (2.i.a.a)
} else {
env.to_syntax(test, input)}) { // (2.i.a.b)
$( env.to_syntax(rhs, input) ) // (2.i.b)
} $(if rest.is_empty() { // (2.i.c)
empty_stream
} else { // (2.i.d)
quote!(env, else { cond!($(rest)) } )
}))
} else { // (2.ii)
panic("Invalid conditional form: {:?}", input);
}
}
This macro, when used, would work as:
fn fib(n: Int) -> Int {
cond!(
(n == 0, 1)
(n == 1, 1)
(else, fib(n-1) + fib(n-2))
)
}
This would produce the program:
fn fib(n: Int) -> Int {
if n == 0 {
1
} else {
if n == 1 {
1
} else {
if true {
fib(n-1) + fib(n-2)
}
}
}
}
While the result is not idiomatic rust, it (a) achieves the goal and (b) could be
transformed into idiomatic rust via a more complex cond!
implementation.
(Incidentally, the implementation has a secondary advantage: if there is no else
clause,
the control flow analysis will report that not all paths can produce a value, guiding the
programmer to detect bugs by reusing existing compiler infrastructure.)
At this point, we have used a number of additional forms and values in defining cond!
. Most notably:
free_id_eq
, which checks if two things are identifiers that refer to the same object a lafree-identifer=?
from Scheme.) We use it to ask if thetest
-position token is actually the keywordelse
.as_ident : String -> TokenTree (?)
, which converts its input into an uncolored 'ident' token (maybe tokenstream?) (We define another form, below, calledas_colored_ident
, which also does recoloring)to_syntax : TokenStream -> &TokenStream -> TokenStream
, which takes two tokenstreams (or token slices / token trees, or better any of the three types) and 'colors' the first input using the second input's coloring before returning the first one. We use this twice, recoloring both the test and the rhs to ensure that they will use the bindings from the macro invocation site (which hasinput
's colors) instead of the macro declaration site.)as_fn_call_syntax : String -> TokenStream -> &TokenStream
, which takes an ident, arguments, and a coloring object, and constructs a call asident(args)
, colored with the coloring object, as syntax.
These would need to be provided to the programmer as 'quality of life' tools.
Even with all of this, however, we have not explored the really thorny bits of this design:
At this point in the design, there's a pretty brutal pun between TokenStreams as syntax
(colored) objects and TokenStreams as non-syntactic (uncolored) objects; for example,
test
and rhs
are uncolored, but input itself is colored. These two variants of
tokenstreams correspond to the structures produced by syntax->datum
and datum->syntax
in the Scheme implementation above.
The Scheme salve for this problem is to provide two different things: datum, in the form
of quoted expressions, and syntax, in the form of syntactic information wrapped around
datum. In Rust, this might manifest as TokenStream
s, analogous to data, and
SyntaxStream
s, analogous to syntax forms. It is unclear if this would be a 'big win';
the types may provide annoying to macro writers.
There is another 'problem' with the code we've seen so far: quote
must manually handle
its environment. This is in stark contrast to the Scheme code above, which carries its
expansion environment as part of the expander, not the macros themselves. Moreover, env
is
already in scope, and quoting with a different environment than the current one seems...
fundamentally incorrect.
It may be possible to completely elide this need by modifying the expander to handle this
context itself . This would require quote!(quotable)
to be expander-handled form instead
of providing it as as a stand-alone entity, and, in addition, it would require that
TokenStream
s (SyntaxStream
s?) know how to recover their colorings so that, eg,
to_syntax
could work as to_syntax(test, input)
without the need for an environment.
While both of these modifications would pull the Rust expander closer to the expanders presented in Syntactic Abstractions in Scheme / Macros That Work Together, they would require serious expander refactoring (and possibly modification) to work.
Finally, we get to unquote splicing, written #,@
in Scheme and $@
in Rust. There
are a few different design proposals for this, each attempting to replicate some of the
style of unquote-splicing from Scheme.
First, to demonstrate how it works in Scheme (and why it is useful there), we present
a few usages of this splicing as Scheme code (eliding #
for readability)
(1a) `(a b ,(reverse '(a b c))) ;; => (a b (c b a))
(1b) `(a b ,@(reverse '(a b c))) ;; => (a b c b a)
(2a) `(+ ,(1 2)) ;; => (+ (1 2))
(2b) `(+ ,@(1 2)) ;; => (+ 1 2)
This structural tool is of particular utility in Scheme because (e1 e2 ...)
applies e1
as a procedure to e2
.
That is: the parentheses are critically important, and so an operator that allows programmers to more easily
control them has immense utility. For example, the recursive cond
case from before may be rewritten using
syntax splicing as:
(old) (datum->syntax x (cons (datum->syntax x 'cond) rest))))))))
(new) #`(cond3 #,@(datum->syntax x rest))
The idea here is that, before, we manually constructed a list of the rest
and syntactic cond
before
converting it inot syntax, and now we can directly construct the list we want. While this operator may
be directly translated into the system as we have it so far, parentheses play a subdued
role in Rust, and, as a result, the ability to trivially escape them seems to lend less utility
than in s-expression languages.
To this end, we present two alternative design proposals for splicing that attempt to find a 'middle ground' utility for providing similar semantic behavior to Scheme's splicing operator (instead of similar syntactic behavior).
NB. If you have another idea for a useful unsplicing-like structure for Rust, please share it!
One-layer splicing is the idea that a programmer is promised that, during splicing, the current braces/parentheses will be 'extruded outward'. For example:
{a; $@({b; c}); d} ~> {a; b; c; d}
Because {}
provide serious meaning in Rust (dealing with Lifetimes), this
scope-extrusion utility more closely semantically mirrors the system in Scheme: macro
writers in Scheme use splicing to restructure the program to ensure a specific structure
around their s-expressions, and this unsplicing will lend itself to a similar approach in
Rust.
We may also provide a nicety that un-extrudable things are simply inlined:
{a; $@(b; c); d} ~> {a; b; c; d}
Finally, taking the previous proposal to the extreme, we might imagine unquote splicing as a large 'flattening' hammer:
{a; $@({b; {c; d}}); e} ~> {a; b; c; d; e}
The advantage here is that programmers are ensured full unwrapping, and don't have to
worry about any semantic-shifting constructions inside of $@(...)
---they will definitely
all be removed in the course of expansion. The downside, of course, is that this prevents
a programmer from only extruding a specific layer.
It's unclear which of these three designs would be most useful. Any of them might provide serious utility to the programmer, but it is also imaginable that their particular semantics may make them generally undesirable as a feature. More work should be done to discern several good examples for how they might be used, and if any of then provide 'big wins' to a macro writer.
TBD