Skip to content

Instantly share code, notes, and snippets.

@sogaiu
Last active August 8, 2021 03:27
Show Gist options
  • Save sogaiu/74ce86e91a67fae41f41d5ee4053c88c to your computer and use it in GitHub Desktop.
Save sogaiu/74ce86e91a67fae41f41d5ee4053c88c to your computer and use it in GitHub Desktop.
examining janet's peg.c via gdb

Examining janet's peg.c via GDB

This is an attempt to provide "just enough" detail to be able to interactively examine how peg/match works. The idea is to use a version of janet compiled with debugging bits compiled in and run this under gdb to examine a simple invocation of peg/match.

Compile a debug version of janet

  1. Clone janet's source and set working directory:

    git clone https://github.com/janet-lang/janet
    cd janet
    
  2. Edit Makefile so that:

    # For cross compilation
    HOSTCC?=$(CC)
    HOSTAR?=$(AR)
    CFLAGS?=-O2
    

    becomes:

    # For cross compilation
    HOSTCC?=$(CC)
    HOSTAR?=$(AR)
    CFLAGS?=-O0 -g3
    

    Note that it's just changing the line that starts with CFLAGS?=. The other lines are just provided for context.

  3. Compile via: make

Create a sample janet file

  1. In janet's top-level directory create a file named test.janet

  2. Put the following content in it:

    (peg/match "a" "a")
    

    To spell things out, this call to peg/match uses the peg pattern "a" (first argument) to attempt a match against the string "a" (second argument).

Start janet via gdb

  1. Start gdb from janet's top-level directory:

    $ gdb -quiet -cd . --args build/janet test.janet
    
  2. Set some breakpoints in peg.c.

    First, one at the function emit_bytes:

    (gdb) b src/core/peg.c:emit_bytes
    

    which is around:

    /* For RULE_LITERAL */
    static void emit_bytes(Builder *b, uint32_t op, int32_t len, const uint8_t *bytes) {
        uint32_t next_rule = janet_v_count(b->bytecode);
    

    Next, one at the label tail within the function peg_rule:

    (gdb) b src/core/peg.c:peg_rule:tail
    

    which is around:

    static const uint8_t *peg_rule(
        PegState *s,
        const uint32_t *rule,
        const uint8_t *text) {
    tail:
        switch (*rule & 0x1F) {
    
  3. Run janet under gdb:

    (gdb) r
    

Peek under the hood

  1. First stop is emit_bytes. Take a look at op, len, and bytes (some of emit_bytes parameters):

    (gdb) p op
    $1 = 0
    
    (gdb) p len
    $2 = 1
    
    (gdb) p *bytes
    $3 = 97 'a'
    

    op in this case is RULE_LITERAL which can be seen to be 0 by looking at janet.h:

    /* opcodes for peg vm */
    typedef enum {
        RULE_LITERAL,      /* [len, bytes...] */
    

    The len and bytes parameters of emit_bytes are the operands for the RULE_LITERAL opcode.

  2. Next stop is peg_rule:

    (gdb) c
    

    A couple of important parameters for peg_rule are rule and text:

    static const uint8_t *peg_rule(
        PegState *s,
        const uint32_t *rule,
        const uint8_t *text) {
    

    which represent the start of the current bytecode and the current position within the string being matched, respectively.

    Stepping (e.g. via n) should proceed under case RULE_LITERAL::

    case RULE_LITERAL: {
        uint32_t len = rule[1];
        if (text + len > s->text_end) return NULL;
        return memcmp(text, rule + 2, len) ? NULL : text + len;
    }
    

    where one can observe the value of the current rule's first operand (i.e. rule[1] named as len). Assuming a sanity check is passed, subsequently, memcmp is used to compare a portion of text (the string that was passed to peg/match originally) to the string from the bytecode (the second operand -- the beginning of which is referred to here as rule + 2).

    According to comments in the source:

    If there is a match, returns a pointer to the next text.
    

    and:

    If there is no match, returns NULL.
    

    Of course, that can be verified in this case via gdb too if desired :)

Explore

This exposition kept the call to peg/match in test.janet very simple, but there's not much to stop one from changing that call to something else and following a similar series of steps with breakpoints modified appropriately.

Alternatively, one could examine peg_compile1 to observe the compilation of a peg into bytecode.

Indeed, other aspects of janet (e.g. things that happen outside of peg.c) may also be examined by following a similar procedure (i.e. modify test.janet, set some breakpoints in appropriate locations, and explore via execution under gdb). More detail along those lines may be found at Examining Janet Internals with GDB.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment