This is an attempt to provide "just enough" detail to be able to interactively
examine how peg/match
works. The idea is to use a version of janet compiled
with debugging bits compiled in and run this under gdb to examine a simple
invocation of peg/match
.
-
Clone janet's source and set working directory:
git clone https://github.com/janet-lang/janet cd janet
-
Edit
Makefile
so that:# For cross compilation HOSTCC?=$(CC) HOSTAR?=$(AR) CFLAGS?=-O2
becomes:
# For cross compilation HOSTCC?=$(CC) HOSTAR?=$(AR) CFLAGS?=-O0 -g3
Note that it's just changing the line that starts with
CFLAGS?=
. The other lines are just provided for context. -
Compile via:
make
-
In janet's top-level directory create a file named
test.janet
-
Put the following content in it:
(peg/match "a" "a")
To spell things out, this call to
peg/match
uses the peg pattern "a" (first argument) to attempt a match against the string "a" (second argument).
-
Start gdb from janet's top-level directory:
$ gdb -quiet -cd . --args build/janet test.janet
-
Set some breakpoints in
peg.c
.First, one at the function
emit_bytes
:(gdb) b src/core/peg.c:emit_bytes
which is around:
/* For RULE_LITERAL */ static void emit_bytes(Builder *b, uint32_t op, int32_t len, const uint8_t *bytes) { uint32_t next_rule = janet_v_count(b->bytecode);
Next, one at the label
tail
within the functionpeg_rule
:(gdb) b src/core/peg.c:peg_rule:tail
which is around:
static const uint8_t *peg_rule( PegState *s, const uint32_t *rule, const uint8_t *text) { tail: switch (*rule & 0x1F) {
-
Run janet under gdb:
(gdb) r
-
First stop is
emit_bytes
. Take a look atop
,len
, andbytes
(some ofemit_bytes
parameters):(gdb) p op $1 = 0
(gdb) p len $2 = 1
(gdb) p *bytes $3 = 97 'a'
op
in this case isRULE_LITERAL
which can be seen to be0
by looking atjanet.h
:/* opcodes for peg vm */ typedef enum { RULE_LITERAL, /* [len, bytes...] */
The
len
andbytes
parameters ofemit_bytes
are the operands for theRULE_LITERAL
opcode. -
Next stop is
peg_rule
:(gdb) c
A couple of important parameters for
peg_rule
arerule
andtext
:static const uint8_t *peg_rule( PegState *s, const uint32_t *rule, const uint8_t *text) {
which represent the start of the current bytecode and the current position within the string being matched, respectively.
Stepping (e.g. via
n
) should proceed undercase RULE_LITERAL:
:case RULE_LITERAL: { uint32_t len = rule[1]; if (text + len > s->text_end) return NULL; return memcmp(text, rule + 2, len) ? NULL : text + len; }
where one can observe the value of the current
rule
's first operand (i.e.rule[1]
named aslen
). Assuming a sanity check is passed, subsequently,memcmp
is used to compare a portion oftext
(the string that was passed topeg/match
originally) to the string from the bytecode (the second operand -- the beginning of which is referred to here asrule + 2
).According to comments in the source:
If there is a match, returns a pointer to the next text.
and:
If there is no match, returns NULL.
Of course, that can be verified in this case via gdb too if desired :)
This exposition kept the call to peg/match
in test.janet
very simple, but there's not much to stop one from changing that call to something else and following a similar series of steps with breakpoints modified appropriately.
Alternatively, one could examine peg_compile1
to observe the compilation of a peg into bytecode.
Indeed, other aspects of janet (e.g. things that happen outside of peg.c
) may also be examined by following a similar procedure (i.e. modify test.janet
, set some breakpoints in appropriate locations, and explore via execution under gdb). More detail along those lines may be found at Examining Janet Internals with GDB.