mlatu specification

https://mlatu-lang.github.io/mlatu/

Syntax

A rule is composed of a non-empty sequence of non-quotation terms, followed by the symbol =, followed by a possibly empty sequence of terms, followed by the symbol .. The first sequence of terms is known as the pattern or the redex, and the second sequence of terms is known as the replacement or the reduction. If two rules with the same replacement are defined, it is undefined behaviour.

A term is either a primitive symbol (+, -, >, <, ,, or ~), a word, or a quotation - the symbol (, followed by a possibly empty sequence of terms, followed by the symbol ). A word is composed of a sequence of Unicode characters which does not include whitespace or any of the syntactical or primitive symbols (+-><,~()=.)

Comments begin with # and terminate at a newline character.

word ::= ? .+ ?
comment ::= ? #.+$ ?

rule ::= <nq_term>+ '=' <term>* '.'

nq_term ::= '+' | '-' | '<' | '>' | ',' | '~' | <word>
term ::= <nq_term> | '(' <term>* ')'

Semantics

A rule serves as an instruction that whenever the pattern/redex is found, it ought to be replaced by the given replacement/reduction. Thus a b = c . says that whenever the sequence of terms a b is found in an input sequence of terms, it ought to be replaced by the term c.

Out-of-order rules are not allowed. i.e. you cannot collect definitions upfront before rewriting.

Primitive rules

There are six primitive rules which cannot be defined in mlatu itself. The distinction between primitives and user-defined rules is that primitives match on arbitrary quotes, while user-defined rules match on specific terms. The effects are as follows:

+ (also known as copy), when found after a quotation, is replaced by a copy of that quotation. (a) + is rewritten to (a) (a).

- (also known as discard), when found after a quotation, removes that quotation. (a) - is rewritten to

> (also known as wrap), when found after a quotation, "quotes" that quotation, adding one level of nesting. (a) > is rewritten to ((a)).

< (also known as unwrap), when found after a quotation, "unquotes" that quotation, removing one level of nesting. ((a)) < is rewritten to (a).

, (also known as combine), when found after two quotations, concatenates those two quotations into one. (a) (b) , is rewritten to (a b).

~ (also known as swap), when found after two quotations, swaps the order of those two quotations. (a) (b) ~ is rewritten to (b) (a).

Evaluation order

The Python pseudocode for rewriting a sequence of terms (neglecting the primitive rules) is as follows:

def rewrite(terms, rules):
    for start in range(len(terms))
        for length in range(len(terms) - start, 1, -1):
            before = terms[:start]
            pattern = terms[start:(start + length)]
            after = terms[(start + length):]
            for rule in rules:
                if rule['redex'] == pattern:
                    new_terms = before + rule['reduction'] + after
                    return rewrite(new_terms, rules)
    return terms

or pseudo-code for the same process:

1. find the longest rewrite rule starting at the first term  
  a. if it was found, rewrite it, and goto 1.  
  b. if it was not found, look for the longest rewrite rule starting at the second term  
    i.   if it was found, rewrite it, and goto 1.  
    ii.  if it was not found, continue looking with the third term etc.  
    iii. if no rule is found that matches anywhere, the rewriting is finished

Note that if there are two rules that would both match the exact same pattern, the rule whose first term is leftmost "wins." Similarly, if two (leftmost) rules both match starting at the same term, the longer one is chosen.

To give an example the process, consider the following set of rules and input expression a b c:

a b = d .
c = e .

We start from the beginning of the sequence of terms and check first if a b c matches any pattern. If it does, we replace it; if not, we try with the second longest sequence of terms: a b. In this case it does, the rule a b = d . so we replace a b with d, getting d c. Since there was a replacement, we start from the beginning of the loop again. Starting with d c now, there is no match so we then try d, for which there is no match either. We then start at the next index, repeating the same process. c does match the rule c = e . so we can replace c with e to get d e. Starting over again, we see that no pattern matches d e, d, or e. Therefore we return d e as the result of rewriting.

SimonMeskens/mlatu.md

mlatu specification

Syntax

Semantics

Primitive rules

Evaluation order