New pattern matcing library introduces support for two very useful concepts: pattern matching and object destructuring.
Pattern matching is a mechanism that allows you to check particular object against the pattern - you could think of it mainly as a way to reduce boilerplate code when comparing objects for equality, checking if value is within range, particular key is present in table and so on.
import std/json, fusion/matching
{.experimental: "caseStmtMacros".}
# Json is very simple data format, but illustrates a lot of useful features
# of pattern matching
case parseJson("""{ "key" : "value" }"""):
# No longer necessary to check if key is present - it is done
# automatically
of { "key" : JInt() }:
discard
# Extracting values from nested data structures also becomes much easier.
of { "key" : (getStr: @val) }:
echo val
assert val is string
value
Second - object destructuring - allows you to extract values from particular fields in object. It is very common in dynamic programming languages such as python. Simplest form of destructuring is already supported by nim - tuple unpacking:
let (val1, val2) = ("some", 12)
And with pattern matching you can now unpack sequences, tables and custom objects.
[(@first, @second), all @trail] := [(12, 3), (33, 4)]
echo first, ", ", second, ", ", trail
12, 3, @[(33, 4)]
Main purpose of pattern matching is simplification of conditions, and
consecutive checks. It is especially useful when paired with
object
variants, but can also be used to do a lot of other things, such as <a href=”ref
object matching”><link to documentation>, extensive support for
<link to documentation>. Special syntactic sugar is
provided for very common use cases, such as Option[T]
checking (similar
to if let
in rust):
import std/options
if Some(@val) ?= some("hello"):
echo val, ". Is string? ", val is string
And matching tree structures of case objects (such as AST). For enum conforming to <a href=”nep1 style guide”>https://nim-lang.org/docs/nep1.html#introduction-naming-conventions naming conventions, you can omit prefix entirely leading to code that looks roughtly like this:
case <some AST node>:
# Node kind is `nnkIdent`, but it is possible to omit `nnk`
of Ident(strVal: @name):
echo "Found ident: ", name
# Extracting subnodes from infix expression.
of InfixExpr[Ident(strVal: "+"), @lhs, @rhs]:
...
# Matching if statements with *exactly* two branches. No need to worry
# about indexing exceptions - `len` is checked accordingly during pattern
# matching.
of IfStmt[ElifBranch[@cond1, _], ElifBranch[@cond2, _]]:
...
Nim macros are one of the most powerful and parts of the language, but they might seem a little intimidating for newcomers, especially when it comes to implementing macro for solving particular problem at hand.
This article gives an example on how one can easily create relatively complex macro using new pattern matching library.
We will be creating macro for dataflow programming, with support for
some operatings in the std/sequtils
module (map/filter/each). The
macro won’t be covering all possible combinations and use cases as it
would make implementation significantly more complicated.
When writing macro, it is very useful to just write DSL code (as if you already had macro) and what you expect it to generate. Decide out what you want to do and how it should look. In our case input could looks roughly like this:
flow lines("/etc/passwd"):
map[_, seq[string]]:
it.split(":")
keepIf:
it.len > 1 and
it.matches [_.startsWith("systemd"), .._]
each:
echo it
And should generate loop that looks like this:
var res = seq[ResType]
for it0 in lines("/etc/passwd"):
let it1 = it0.split(":")
if it1.len > 1 and it1.matches [_.startsWith("systemd"), .._]:
echo it1
Now the question is - how to transform first into second? We will
start by first looking at the output for dumpTree
on flow
macro:
dumpTree:
flow lines("/etc/passwd"):
map[_, seq[string]]:
it.split(":")
StmtList
Command
Ident "flow" (ref:1)
Call (ref:2)
Ident "lines"
StrLit "/etc/passwd"
StmtList
Call
BracketExpr
Ident "map"
Ident "_"
BracketExpr
Ident "seq"
Ident "string"
StmtList
Call
DotExpr
Ident "it"
Ident "split"
StrLit ":"
This load of text might seem a little indimidating at first, but in
the end it can be taken apart quite easily (and that is exactly what
we will be doing). First, on line (1) we see the flow
identifier -
this is a start of our macro. Then (2) is a lines("/etc/passwd")
argument.
I usually try to introduce some kind of intermediate representation for DSL in order to make things more organized and decouple parsing stage from code generation. This might make implementation a little longer, but more extensible and robust. You can, without a doubt, just go directly to code generation, but for more complex DSL I would still recommend using some kind of IR.
In this particular case DSL structure for flow
macro can be
described as:
type
FlowStageKind = enum
fskMap # Stage for element conversion
fskFilter # Filter elements
fskEach # Execute action without returning value
FlowStage = object
outputType: Option[NimNode] # Assert result type
kind: FlowStageKind # Type of the stage
body: NimNode # Stage body
it directly maps on the input DSL. map:
should create fskMap
stage, filter
creates fskFilter
and so on. Optionally you can
specify output type like this map [ExpectedOutput]
.
Now, after we have good understanding of what exactly we want to do -
the question is ‘how?’. That’s where fusion/matching
comes
particularly handy - we already identifier all patterns, and now it is
only matter of writing this down in code.
Without pattern matching you’d be left with long series of repeating
[0][0][0]
and if kind == nnkBracketExpr
in order to retrieve parts
from DSL and validate input.
Before we proceed to writing paterns for whole DSL it is important to consider three possible cases of writing a stage. First once is very simple - no type specified, only stage identifier.
dumpTree:
map:
body
StmtList Call Ident "map" StmtList Ident "body"
But single stage with type parameter can be written using two different ways - both are syntactically correct, but have comparatively different parse trees:
dumpTree:
map [a]:
body
StmtList Command Ident "map" Bracket Ident "a" StmtList Ident "body"
dumpTree:
map[a]:
body
StmtList Call BracketExpr Ident "map" Ident "a" StmtList Ident "body"
Let’s make a small digression in order to better understand how new pattern matching library can help us here.
We will be focusing on parts that are relevant to our task - for more details you can read <link to documentation>.
When writing nim macros you are mostly dealing with NimNode objects -
first to process input AST, and then generate new code back. AST is
comprised of case objects. Usually, first part of the macro involves
lots of checks for correct node kind, followed by iteration over
subnodes to extract input data. Pattern matching simplifies this,
allowing to directly write expected patterns for ast, with syntax
closely matching that of dumpTree
.
For example - if we have code like map[string]
it has following tree
representation:
dumpTree:
map[string]
StmtList BracketExpr Ident "map" Ident "string"
And can be matched using following pattern:
body.assertMatch:
BracketExpr:
@head
@typeParam
Notice similarity between AST and pattern for matching - each node
has kind
field, which describes what kind of node this is. In this
case we are interested in first and second subnodes of the BracketExpr
node - flow stage kind and type parameter respectively.
As we have alreay seed earlier, map [string]
and map[string]
are
parsed differently - first one is handled as one-element array passed
to map
function argument, and second is bracket expression (like
accessing element of the array). This difference is caused by method
call syntax and usually makes programming DSL a little harder - you
need to check for both alternatives, remember which index each capture
should be in etc.
With pattern matching though it becomes quite easy to do - adding second alternative to match will be enough.
body.matches:
# More compact way of writing `BracketExpr`
BracketExpr[@head, @typeParam] |
Command[@head, Bracket[@typeParam]]
It should also be possible to omit type parameters from DSL entirely -
they are quite nice and would allow for better type checking, but
could become quite annoying to write. So, we should also expect
someone to just write map
- without any type qualifications. To
handle this case we add third alternative for pattern -
body.matches:
BracketExpr[@head, @typeParam] |
Command[@head, Bracket[@typeParam]] |
@head is Ident()
This brings one important change typeParam
capture is no longer
NimNode
- type has changed to Option[NimNode]
, because not all
alternatives have this variable. head
is still a NimNode
just as
before - all possible alternatives contain this variable, so it would
be set if input matches.
Out first stage would be processing of all input into FlowStage
. We
already have a way to extract data from input ast - using pattern
matching.
macro flow(arg, body: untyped): untyped =
var stages: seq[FlowStage]
for elem in body:
if elem.matches(
Call[BracketExpr[@ident, opt @outType], @body] |
# `map[string]:`
Command[@ident is Ident(), Bracket [@outType], @body] |
# `map [string]:`
Call[@ident is Ident(), @body]
# just `map:`, without type argument
):
stages.add FlowStage(
kind: identToKind(ident),
outputType: outType,
body: body
)
After that, we have all necessary information for generating result
code. If last stage is not each
- e.g. there is a return value after
each iteration, we need to determine type of the result sequence and
then append to it on each iteration.
if stages[^1].kind notin {fskEach}:
# If last stage has return type (not `each`) then we need to
# accumulate results in temporary variable.
result = quote do:
var `resId`: seq[#[ Type of the expression ]#]
for it0 {.inject.} in `arg`:
`resId`.add #[ Expression to evaluate]#
`resId`
else:
# Otherwise just iterate each element
result = quote do:
for it0 {.inject.} in `arg`:
#[ Expression to evaluate ]#
Each stage of the dataflow has a type, and potentially defines
variables. In addition to that - each stage uses special variable
it
- that has to be injected separately for each stage but at the
same time it is used for commulicating values between stages.
flow [1,2,3]:
map:
it * 2
map:
$it
is equivalent to
var res: seq[#[ Type of the expression ]#]
for it in [1, 2, 3]:
let it = it * 2
let it = $it
res.add it
res
As you can clearly see, such code would not even compile due to redefinition errors. There are two possible ways to solve this problem - kind of obvious, and not-all-that-obvious. Let’s start with the first one - since each variable can be redefined in the new scope we can just do
for it in [1, 2, 3]:
block:
let it = it * 2
block:
let it = $it
echo "Add result - ", it
And it would compile and work perfectly fine. But now we have a
problem of getting type of the expression itself - everything is fine
as long as you only use map
- after all block:
is an expression,
and we can have something like
echo typeof((block:
let it = 1
block:
let it = it * 2
block: $it))
Not the prettiest code in the world, by all means - but it will become
even worse when we have to deal with filter
, each
, injected
variables and iterators.
Second alternative is to use declare proc with auto
return variable
and assign result of the expression to it. In that case compiler will
figure out return type for us.
proc hello[T](a: T): auto =
for c in "ee":
result = (12, "som", "ee", a)
echo typeof hello[int]
Now we only need to write code generation for #[ Expression to
evaluate ]#
and substitute result =
when necessary.
Each stage in flow
injects it
variable - result of the evaluation
from previous stage. To avoid getting redefinition errors from
mulitiple let it = <expression>
on each stage, we will replace each
ocurrence of it
with it<stage-index>
. For the first stage it would
be it -> it1
, second one is it -> it2
and so on.
rewrite
takes input NimNode
and either returns it as-is (if no
rewriting is necessary) or, in case of identifier it
(Ident(strVal:
"it")
) convert it into new one with corresponding index.
proc rewrite(node: NimNode, idx: int): NimNode =
case node:
of Ident(strVal: "it"):
result = ident("it" & $idx)
of (kind: in nnkTokenKinds): # `nnkTokenKinds` is a set of node
# kinds that don't have subnodes.
# These ones are returned without any
# modifications.
result = node
else:
# For node kinds with subnodes, rewriting must be done
# recursively
result = newTree(node.kind)
for subn in node:
result.add subn.rewrite(idx)
For each stage we rewrite body, and then append new chunk of generated code to result.
func evalExprFromStages(stages: seq[FlowStage]): NimNode =
result = newStmtList()
for idx, stage in stages:
# Rewrite body
let body = stage.body.rewrite(idx)
case stage.kind:
# If stage is a filter it is converted into `if` expression
# and new new variables are injected.
of fskFilter:
result.add quote do:
let stageOk = ((`body`))
if not stageOk:
continue
of fskEach:
# `each` has no variables or special formatting - just
# rewrite body and paste it back to resulting code
result.add body
of fskMap:
# Create new identifier for injected node and assign
# result of `body` to it.
let itId = ident("it" & $(idx + 1))
result.add quote do:
let `itId` = `body`
# If output type for stage needs to be explicitly checked
# create type assertion.
if Some(@expType) ?= stage.outputType:
result.add makeTypeAssert(expType, stage.body, itId)
func typeExprFromStages(stages: seq[FlowStage], arg: NimNode): NimNode =
let evalExpr = evalExprFromStages(stages)
var
resTuple = nnkPar.newTree(ident "it0")
for idx, stage in stages:
if st.kind notin {fskFilter}:
resTuple.add ident("it" & $(idx + 1))
let lastId = newLit(stages.len - 1)
result = quote do:
block:
(
proc(): auto =
for it0 {.inject.} in `arg`:
`evalExpr`
result = `resTuple`
)()[`lastId`]
macro flow(arg, body: untyped): untyped =
var stages: seq[FlowStage]
for elem in body:
if elem.matches(
Call[BracketExpr[@ident, opt @outType], @body] |
# `map[string]:`
Command[@ident is Ident(), Bracket [@outType], @body] |
# `map [string]:`
Call[@ident is Ident(), @body]
# just `map:`, without type argument
):
stages.add FlowStage(
kind: identToKind(ident),
outputType: outType,
body: body
)
let evalExpr = evalExprFromStages(stages)
if stages[^1].kind notin {fskEach}:
# If last stage has return type (not `each`) then we need to
# accumulate results in temporary variable.
let resExpr = typeExprFromStages(stages, arg)
let lastId = ident("it" & $stages.len)
let resId = ident("res")
result = quote do:
var `resId`: seq[typeof(`resExpr`)]
for it0 {.inject.} in `arg`:
`evalExpr`
`resId`.add `lastid`
`resId`
else:
result = quote do:
for it0 {.inject.} in `arg`:
`evalExpr`
result = newBlockStmt(result)
Implementation of this macro is inspired by zero_functional
library.
Full code can be seen in <link to test suite>.