Skip to content

Instantly share code, notes, and snippets.

@valarpirai
Last active May 9, 2026 15:31
Show Gist options
  • Select an option

  • Save valarpirai/7e0d049a318d181f2673e23e7d41afd7 to your computer and use it in GitHub Desktop.

Select an option

Save valarpirai/7e0d049a318d181f2673e23e7d41afd7 to your computer and use it in GitHub Desktop.
How to build a programming language. Step by step guide

How to Build a Simple Programming Language

A step-by-step guide grounded in the Aether language — a complete tree-walking interpreter written in Rust.


For Beginners: Where to Start

Prerequisites

Before building a language, be comfortable with:

  1. One programming language well — Python, JavaScript, or Java are fine. Rust is great but not required to start.
  2. Recursion — parsers are deeply recursive. Practice this first if needed.
  3. Basic data structures — hashmaps, enums, trees. You'll use all three constantly.

Best Learning Resource

Start with Crafting Interpreters by Robert Nystrom — free online at craftinginterpreters.com.

Exact Starting Path

Week 1–2: Read Crafting Interpreters (Part 1)

  • Walks you through building "Lox" — a small language, in Java or C
  • Part 1 (chapters 1–13) builds a tree-walking interpreter exactly like Aether
  • Do the challenges at the end of each chapter

Week 3: Build something tiny yourself

  • A calculator that handles 1 + 2 * 3 with correct precedence
  • Add variables: let x = 5
  • That's enough to understand the full pipeline

Week 4+: Extend it

  • Add if/else, then while, then functions
  • Each addition teaches you something new

Language to Use

Language Recommendation
Java Follow the book exactly — easiest start
Python Great for experimenting, very little boilerplate
Rust Harder, but what Aether uses — do Java first, then port
JavaScript Works fine, widely familiar

Start in Java or Python. Don't start in Rust — the borrow checker will fight you while you're still learning the concepts.

The One Thing to Avoid

Don't try to design your language first. Beginners spend weeks on syntax decisions before writing a single line of code. Start building immediately — your design will naturally emerge from what you implement.

The Aether codebase is a good reference once you've finished Part 1 of Crafting Interpreters. By then you'll recognize every file and know exactly what each piece does.


The Big Picture

Building a language has 3 stages. Each transforms source code one step closer to execution.

"let x = 1 + 2"
       ↓  Lexer
[Let, Ident("x"), Equal, Int(1), Plus, Int(2), Eof]
       ↓  Parser
Stmt::Let("x", Expr::Binary(Int(1), Add, Int(2)))
       ↓  Interpreter
Environment: { x = 3 }

Architecture

Source code (text)
        │
        ▼
   ┌─────────┐
   │  Lexer  │  reads chars → emits tokens
   └────┬────┘
        │ Vec<Token>
        ▼
   ┌─────────┐
   │  Parser │  recursive descent → builds AST
   └────┬────┘
        │ Program (Vec<Stmt>)
        ▼
   ┌─────────────┐
   │ Interpreter │  walks AST → mutates Environment
   └──────┬──────┘
          │
    ┌─────┴───────┐
    │             │
 Environment   Value types
 (scoping)     (Int/String/Array/Function/...)

Build Order

Step What Why
1 Token types + Lexer Foundation. Test with print statements
2 AST nodes Just data structures, no logic yet
3 Parser (expressions) 1 + 2 * 3 working is a milestone
4 Values + Environment Runtime scaffolding
5 Evaluator (expressions) Arithmetic, comparisons, variables
6 Statements: let, blocks, if Now it's a real language
7 Functions + Closures The hardest part
8 Loops, return, break Control flow
9 Built-ins: print, len, type Usability
10 REPL Interactive testing loop

Step 1 — Token Types

Tokens are the atoms of your language — the smallest meaningful units.

pub enum TokenKind {
    // Literals carry their value
    Integer(i64),
    Float(f64),
    String(String),
    True, False, Null,

    // Variables and keywords
    Identifier(String),
    Let, Fn, Return, If, Else, While, For, Break, Continue,

    // Operators
    Plus, Minus, Star, Slash, Percent,
    Equal, EqualEqual, NotEqual,
    Less, Greater, LessEqual, GreaterEqual,
    And, Or, Not,

    // Delimiters
    LeftParen, RightParen, LeftBrace, RightBrace,
    Comma, Dot, Colon,

    Eof,  // always needed — signals end of input
}

// Wrap kind with position info for error messages
pub struct Token {
    pub kind: TokenKind,
    pub lexeme: String,   // raw text, e.g. "42"
    pub line: usize,
    pub column: usize,
}

Rule of thumb: Start minimal. Add tokens only as you add features.


Step 2 — The Lexer (Scanner)

The lexer reads source characters one by one and groups them into tokens.

Scanner State

pub struct Scanner {
    source: Vec<char>,  // source as char array for easy indexing
    tokens: Vec<Token>,
    start: usize,       // start position of current token
    current: usize,     // current character position
    line: usize,
    column: usize,
}

Main Loop

pub fn scan_tokens(&mut self) -> Result<Vec<Token>, LexerError> {
    while !self.is_at_end() {
        self.start = self.current;  // mark start of next token
        self.scan_token()?;
    }
    self.tokens.push(Token::new(TokenKind::Eof, ...));
    Ok(self.tokens.clone())
}

Dispatching on Characters

fn scan_token(&mut self) -> Result<(), LexerError> {
    let c = self.advance();
    match c {
        ' ' | '\r' | '\t' => {}       // skip whitespace
        '\n' => { self.line += 1; }
        '(' => self.add_token(TokenKind::LeftParen),
        '+' => {
            // look-ahead: '+' or '+='?
            let kind = if self.match_char('=') {
                TokenKind::PlusEqual
            } else {
                TokenKind::Plus
            };
            self.add_token(kind);
        }
        '/' => {
            if self.match_char('/') {
                // comment — skip to end of line
                while self.peek() != '\n' && !self.is_at_end() {
                    self.advance();
                }
            } else {
                self.add_token(TokenKind::Slash);
            }
        }
        '"' => self.scan_string()?,
        _ => {
            if c.is_ascii_digit()          { self.scan_number()?; }
            else if c.is_alphabetic()      { self.scan_identifier(); }
            else { return Err(LexerError::UnexpectedCharacter(c, ...)); }
        }
    }
    Ok(())
}

Three Essential Helper Methods

fn advance(&mut self) -> char {
    let c = self.source[self.current];
    self.current += 1;
    c
}

fn peek(&self) -> char {
    if self.is_at_end() { '\0' } else { self.source[self.current] }
}

// Consume only if next char matches — used for two-char tokens like ==
fn match_char(&mut self, expected: char) -> bool {
    if self.is_at_end() || self.source[self.current] != expected {
        return false;
    }
    self.current += 1;
    true
}

Identifiers and Keywords

fn scan_identifier(&mut self) {
    while self.peek().is_alphanumeric() || self.peek() == '_' {
        self.advance();
    }
    let text: String = self.source[self.start..self.current].iter().collect();
    let kind = match text.as_str() {
        "let"   => TokenKind::Let,
        "fn"    => TokenKind::Fn,
        "if"    => TokenKind::If,
        "else"  => TokenKind::Else,
        "while" => TokenKind::While,
        "true"  => TokenKind::True,
        "false" => TokenKind::False,
        "null"  => TokenKind::Null,
        _       => TokenKind::Identifier(text.clone()),
    };
    self.add_token(kind);
}

Step 3 — AST Design

The AST represents the grammatical structure of code as a tree. Two node kinds:

  • Expressions — produce a value when evaluated
  • Statements — perform an action
pub enum Expr {
    Integer(i64),
    Float(f64),
    String(String),
    Bool(bool),
    Null,
    Identifier(String),                          // variable lookup
    Binary(Box<Expr>, BinaryOp, Box<Expr>),      // 1 + 2, x == y
    Unary(UnaryOp, Box<Expr>),                   // -x, !flag
    Call(Box<Expr>, Vec<Expr>),                  // fn_name(arg1, arg2)
    Array(Vec<Expr>),                            // [1, 2, 3]
    Index(Box<Expr>, Box<Expr>),                 // arr[0]
    Member(Box<Expr>, String),                   // obj.field
    FunctionExpr(Vec<String>, Box<Stmt>),        // fn(x) { x + 1 }
}

pub enum Stmt {
    Expr(Expr),
    Let(String, Expr),                           // let x = 5
    Assign(Expr, Expr),                          // x = 10
    Block(Vec<Stmt>),                            // { ... }
    If(Expr, Box<Stmt>, Option<Box<Stmt>>),      // if (cond) { } else { }
    While(Expr, Box<Stmt>),                      // while (cond) { }
    For(String, Expr, Box<Stmt>),                // for x in arr { }
    Return(Option<Expr>),
    Break,
    Continue,
    Function(String, Vec<String>, Box<Stmt>),    // fn name(params) { body }
}

Why Box<T>? For recursive types (an Expr that contains another Expr). Rust requires boxing so the size is known at compile time.


Step 4 — The Parser

The parser consumes tokens and produces an AST using recursive descent — one function per grammar rule.

Parser State

pub struct Parser {
    tokens: Vec<Token>,
    current: usize,
}

Helper Methods (mirrors the lexer)

fn peek(&self) -> &Token { &self.tokens[self.current] }
fn advance(&mut self) -> &Token { self.current += 1; self.previous() }
fn check(&self, kind: &TokenKind) -> bool { /* discriminant match */ }

fn match_token(&mut self, kinds: &[TokenKind]) -> bool {
    // consume token if it matches any of the given kinds
}

fn consume(&mut self, kind: TokenKind, msg: &str) -> Result<&Token, ParseError> {
    if self.check(&kind) { Ok(self.advance()) }
    else { Err(ParseError::UnexpectedToken { expected: msg.to_string(), ... }) }
}

Top-Level Loop

pub fn parse(&mut self) -> Result<Program, ParseError> {
    let mut statements = Vec::new();
    while !self.is_at_end() {
        statements.push(self.declaration()?);
    }
    Ok(Program::new(statements))
}

Parsing a Statement

fn let_declaration(&mut self) -> Result<Stmt, ParseError> {
    // 'let' already consumed — expect: name = expr
    let name = /* read Identifier token */;
    self.advance();
    self.consume(TokenKind::Equal, "=")?;
    let initializer = self.expression()?;
    Ok(Stmt::Let(name, initializer))
}

Operator Precedence — The Key Challenge

Parse lower-precedence operators first; each level calls the one below for operands:

expression()      lowest: assignment
  or_expr()       ||
    and_expr()    &&
      equality()  == !=
        compare() < > <= >=
          add()   + -
            mul() * / %
              unary()   - !
                call()  f(), a[], a.b
                  primary()  literals, identifiers, (expr)
fn add_expr(&mut self) -> Result<Expr, ParseError> {
    let mut left = self.mul_expr()?;  // higher precedence first
    while self.match_token(&[TokenKind::Plus, TokenKind::Minus]) {
        let op = match self.previous().kind {
            TokenKind::Plus  => BinaryOp::Add,
            TokenKind::Minus => BinaryOp::Subtract,
            _ => unreachable!(),
        };
        let right = self.mul_expr()?;
        left = Expr::Binary(Box::new(left), op, Box::new(right));
    }
    Ok(left)
}

Step 5 — Runtime Values

Values are what expressions evaluate to.

pub enum Value {
    Int(i64),
    Float(f64),
    String(Rc<String>),    // Rc = cheap clones, automatic cleanup
    Bool(bool),
    Null,
    Array(Rc<Vec<Value>>),
    Function {
        params: Vec<String>,
        body: Box<Stmt>,
        closure: Box<Environment>,  // captured scope — how closures work
    },
    BuiltinFn {
        name: String,
        arity: usize,
        func: fn(&[Value]) -> Result<Value, RuntimeError>,
    },
}

Essential Methods

// Used in if/while conditions
pub fn is_truthy(&self) -> bool {
    match self {
        Value::Bool(b)   => *b,
        Value::Null      => false,
        Value::Int(0)    => false,
        Value::String(s) if s.is_empty() => false,
        _                => true,
    }
}

// Used in error messages
pub fn type_name(&self) -> &str {
    match self {
        Value::Int(_)    => "int",
        Value::String(_) => "string",
        Value::Array(_)  => "array",
        // ...
    }
}

Why Rc<T>? Without it, cloning a Value copies the entire array/string. With Rc, clone only increments a reference count — data is shared. Memory is freed automatically when the last reference drops.


Step 6 — Environment (Variable Scoping)

The environment maps variable names to values. Lexical scoping is implemented by chaining environments: each scope points to its parent.

pub struct Environment {
    values: HashMap<String, Value>,
    parent: Option<Box<Environment>>,  // None = global scope
}

impl Environment {
    pub fn new() -> Self { ... }  // global scope

    pub fn with_parent(parent: Environment) -> Self {
        Self { values: HashMap::new(), parent: Some(Box::new(parent)) }
    }

    pub fn define(&mut self, name: String, value: Value) {
        self.values.insert(name, value);  // always in current scope
    }

    pub fn get(&self, name: &str) -> Result<Value, RuntimeError> {
        if let Some(v) = self.values.get(name) {
            Ok(v.clone())
        } else if let Some(parent) = &self.parent {
            parent.get(name)  // walk up the chain
        } else {
            Err(RuntimeError::UndefinedVariable(name.to_string()))
        }
    }

    pub fn set(&mut self, name: &str, value: Value) -> Result<(), RuntimeError> {
        if self.values.contains_key(name) {
            self.values.insert(name.to_string(), value);
            Ok(())
        } else if let Some(parent) = &mut self.parent {
            parent.set(name, value)  // find in outer scope, update there
        } else {
            Err(RuntimeError::UndefinedVariable(name.to_string()))
        }
    }
}

Scoping Visualization

Global env:   { x: 1, greet: <fn> }
      ↑
Function env: { name: "Alice" }   ← params live here
      ↑
Block env:    { tmp: 5 }          ← block-scoped variables

get("x") walks up until found. get("missing") fails at the top with UndefinedVariable.


Step 7 — The Interpreter (Evaluator)

The evaluator walks the AST and executes it. Two main methods: eval_expr and exec_stmt.

Evaluator State

pub struct Evaluator {
    pub environment: Environment,
    call_depth: usize,       // current recursion depth
    max_call_depth: usize,   // limit (e.g. 100) to prevent stack overflow
}

Evaluating Expressions

pub fn eval_expr(&mut self, expr: &Expr) -> Result<Value, RuntimeError> {
    match expr {
        Expr::Integer(n)       => Ok(Value::Int(*n)),
        Expr::String(s)        => Ok(Value::String(Rc::new(s.clone()))),
        Expr::Bool(b)          => Ok(Value::Bool(*b)),
        Expr::Null             => Ok(Value::Null),

        Expr::Identifier(name) => self.environment.get(name),

        Expr::Array(elements)  => {
            let values: Vec<_> = elements
                .iter()
                .map(|e| self.eval_expr(e))
                .collect::<Result<_, _>>()?;
            Ok(Value::Array(Rc::new(values)))
        }

        Expr::Binary(left, op, right)  => self.eval_binary(left, *op, right),
        Expr::Call(callee, args)       => self.eval_call(callee, args),

        // Snapshot current env as closure
        Expr::FunctionExpr(params, body) => Ok(Value::Function {
            params: params.clone(),
            body: body.clone(),
            closure: Box::new(self.environment.clone()),
        }),
    }
}

Executing Statements — ControlFlow Enum

Non-local control flow (return, break, continue) is handled by propagating an enum up the call stack:

enum ControlFlow {
    None,            // keep going
    Return(Value),   // propagate return value up
    Break,           // exit loop
    Continue,        // skip to next iteration
}

fn exec_stmt(&mut self, stmt: &Stmt) -> Result<ControlFlow, RuntimeError> {
    match stmt {
        Stmt::Let(name, init) => {
            let value = self.eval_expr(init)?;
            self.environment.define(name.clone(), value);
            Ok(ControlFlow::None)
        }

        Stmt::Block(stmts) => {
            // Push a new scope
            let outer = self.environment.clone();
            self.environment = Environment::with_parent(outer);

            let mut result = ControlFlow::None;
            for s in stmts {
                result = self.exec_stmt(s)?;
                if result != ControlFlow::None { break; }  // propagate
            }

            // Pop scope
            self.environment = *self.environment.parent.take().unwrap();
            Ok(result)
        }

        Stmt::If(cond, then_b, else_b) => {
            if self.eval_expr(cond)?.is_truthy() {
                self.exec_stmt(then_b)
            } else if let Some(e) = else_b {
                self.exec_stmt(e)
            } else {
                Ok(ControlFlow::None)
            }
        }

        Stmt::While(cond, body) => {
            loop {
                if !self.eval_expr(cond)?.is_truthy() { break; }
                match self.exec_stmt(body)? {
                    ControlFlow::Break     => break,
                    ControlFlow::Return(v) => return Ok(ControlFlow::Return(v)),
                    _                      => {}
                }
            }
            Ok(ControlFlow::None)
        }

        Stmt::Return(expr) => {
            let val = match expr {
                Some(e) => self.eval_expr(e)?,
                None    => Value::Null,
            };
            Ok(ControlFlow::Return(val))
        }

        Stmt::Break    => Ok(ControlFlow::Break),
        Stmt::Continue => Ok(ControlFlow::Continue),
    }
}

Calling a Function

fn eval_call(&mut self, callee: &Expr, arg_exprs: &[Expr]) -> Result<Value, RuntimeError> {
    let func = self.eval_expr(callee)?;

    // Evaluate args in caller's scope
    let args: Vec<Value> = arg_exprs
        .iter()
        .map(|a| self.eval_expr(a))
        .collect::<Result<_, _>>()?;

    match func {
        Value::BuiltinFn { func, .. } => func(&args),

        Value::Function { params, body, closure } => {
            self.call_depth += 1;
            if self.call_depth > self.max_call_depth {
                return Err(RuntimeError::StackOverflow { ... });
            }

            // New scope rooted at CLOSURE (not caller!) — this is what makes closures work
            let mut call_env = Environment::with_parent(*closure);
            for (param, arg) in params.iter().zip(args.iter()) {
                call_env.define(param.clone(), arg.clone());
            }

            let outer = std::mem::replace(&mut self.environment, call_env);
            let result = self.exec_stmt(&body)?;
            self.environment = outer;

            self.call_depth -= 1;
            match result {
                ControlFlow::Return(v) => Ok(v),
                _                      => Ok(Value::Null),
            }
        }

        _ => Err(RuntimeError::InvalidOperation("not callable".to_string())),
    }
}

Closure key insight: When calling a function, its scope is rooted at closure (where it was defined), not the caller's scope. The function "remembers" its definition environment.


Step 8 — Built-in Functions

Built-ins are Rust functions registered in the global environment at startup.

Registration

fn register_builtins(&mut self) {
    self.environment.define("println".to_string(), Value::BuiltinFn {
        name: "println".to_string(),
        arity: usize::MAX,  // variadic
        func: builtin_println,
    });
    self.environment.define("len".to_string(), Value::BuiltinFn {
        name: "len".to_string(),
        arity: 1,
        func: builtin_len,
    });
}

Implementing a Built-in

pub fn builtin_len(args: &[Value]) -> Result<Value, RuntimeError> {
    match &args[0] {
        Value::Array(a)  => Ok(Value::Int(a.len() as i64)),
        Value::String(s) => Ok(Value::Int(s.len() as i64)),
        v => Err(RuntimeError::TypeError {
            expected: "array or string".to_string(),
            got: v.type_name().to_string(),
        }),
    }
}

pub fn builtin_println(args: &[Value]) -> Result<Value, RuntimeError> {
    let parts: Vec<String> = args.iter().map(|v| format!("{}", v)).collect();
    println!("{}", parts.join(" "));
    Ok(Value::Null)
}

Step 9 — Wire It All Together

fn run(source: &str) {
    // Stage 1: Lex
    let tokens = Scanner::new(source).scan_tokens().unwrap();

    // Stage 2: Parse
    let program = Parser::new(tokens).parse().unwrap();

    // Stage 3: Interpret
    let mut evaluator = Evaluator::new();
    for stmt in &program.statements {
        evaluator.exec_stmt(stmt).unwrap();
    }
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    if args.len() > 1 {
        let source = std::fs::read_to_string(&args[1]).unwrap();
        run(&source);
    } else {
        run_repl();  // interactive mode
    }
}

Error Types

Define errors at each stage with helpful messages:

// Lexer errors
pub enum LexerError {
    UnexpectedCharacter(char, usize, usize),
    UnterminatedString(usize, usize),
    InvalidNumber(String, usize, usize),
}

// Parser errors
pub enum ParseError {
    UnexpectedToken { expected: String, found: Token },
    UnexpectedEof,
    InvalidAssignmentTarget,
}

// Runtime errors
pub enum RuntimeError {
    UndefinedVariable(String),
    TypeError { expected: String, got: String },
    DivisionByZero,
    IndexOutOfBounds { index: i64, length: usize },
    InvalidOperation(String),
    ArityMismatch { expected: usize, got: usize },
    StackOverflow { depth: usize, limit: usize },
}

Key Concepts Summary

Concept What it means
Lexer Turns text into a flat list of typed tokens
Parser Turns flat tokens into a nested AST tree
AST Data structure representing code's grammatical structure
Environment HashMap of name → value, chained for nested scopes
Closure Function that captures its definition environment
ControlFlow enum Signals return/break/continue up the call stack
Rc<T> Reference-counted pointer for cheap cloning of heap values
Tree-walking Directly eval the AST — simple, no compilation step

Deep Dive: How a Tree-Walking Interpreter Works

The name says it all — it walks the AST tree, visiting each node and executing it directly. No compilation, no bytecode, no translation step.

The Mental Model

Think of the AST as a blueprint of your code. The interpreter is a person with a calculator who walks the blueprint top to bottom, evaluating each piece on the spot.

For let x = 1 + 2 * 3:

         Stmt::Let("x")
               │
          Expr::Binary(+)
          ┌────┴────┐
       Int(1)   Expr::Binary(*)
                ┌────┴────┐
             Int(2)     Int(3)

The interpreter visits depth-first:

  1. Reach Binary(*) → go deeper first
  2. Evaluate Int(2)2
  3. Evaluate Int(3)3
  4. Apply *6
  5. Back up: evaluate Int(1)1
  6. Apply +7
  7. Store 7 in environment as x

Two Recursive Functions Drive Everything

eval_expr()  ←──calls──→  exec_stmt()
     │                          │
     └── recurses into          └── recurses into
         child expressions          child statements

eval_expr takes an AST node, returns a Value:

pub fn eval_expr(&mut self, expr: &Expr) -> Result<Value, RuntimeError> {
    match expr {
        Expr::Integer(n)       => Ok(Value::Int(*n)),       // leaf: return immediately
        Expr::Identifier(name) => self.environment.get(name), // variable lookup

        Expr::Binary(left, op, right) => {
            let left_val  = self.eval_expr(left)?;    // recurse left
            let right_val = self.eval_expr(right)?;   // recurse right
            self.apply_op(op, left_val, right_val)    // combine
        }

        Expr::Call(callee, args) => self.eval_call(callee, args),
    }
}

exec_stmt takes a statement, returns a ControlFlow signal:

fn exec_stmt(&mut self, stmt: &Stmt) -> Result<ControlFlow, RuntimeError> {
    match stmt {
        Stmt::If(cond, then_branch, else_branch) => {
            if self.eval_expr(cond)?.is_truthy() {  // eval condition
                self.exec_stmt(then_branch)          // recurse into branch
            } else if let Some(e) = else_branch {
                self.exec_stmt(e)
            } else {
                Ok(ControlFlow::None)
            }
        }

        Stmt::Block(stmts) => {
            for s in stmts {
                let flow = self.exec_stmt(s)?;       // recurse each statement
                if flow != ControlFlow::None { return Ok(flow); }
            }
            Ok(ControlFlow::None)
        }
    }
}

Every node causes the interpreter to recurse deeper until it hits a leaf (a literal like Int(42)), then unwind back up combining results.

What Happens at Each Node

AST Node What the interpreter does
Int(42) Returns Value::Int(42) immediately (leaf)
Identifier("x") Looks up x in the environment chain
Binary(left, +, right) Recurse left, recurse right, add the results
Unary(-, expr) Recurse into expr, negate the result
Let("x", expr) Recurse into expr, store result in environment
If(cond, then, else) Recurse cond, pick a branch to recurse into
While(cond, body) Loop: recurse cond, if true recurse body, repeat
Call(fn, args) Recurse each arg, create new scope, recurse body
Block([s1, s2, ...]) Recurse each statement in order

ControlFlow: How return/break/continue Work

These can't just return normally — they need to escape multiple levels of recursion. The ControlFlow enum bubbles up through exec_stmt calls:

enum ControlFlow {
    None,            // keep executing normally
    Return(Value),   // carry value up the call stack
    Break,           // exit the loop
    Continue,        // skip rest of loop body
}
exec_stmt(While)
  └─ exec_stmt(Block)
       └─ exec_stmt(If)
            └─ exec_stmt(Return(42))
                 └─ returns ControlFlow::Return(42)
            ← propagates up
       ← propagates up
  ← While sees Return, stops looping
← Function unwraps it: returns Value::Int(42)

Function Calls: Step by Step

Calling add(1, 2) where fn add(a, b) { return a + b }:

1. eval_expr(Call("add", [Int(1), Int(2)]))
   ├─ eval "add"  →  Value::Function{params, body, closure}
   ├─ eval Int(1) →  Value::Int(1)
   ├─ eval Int(2) →  Value::Int(2)
   │
   └─ Create new scope rooted at the closure:
         { a: 1, b: 2 }  ↑ parent: closure env
      │
      └─ exec_stmt(Return(Binary("a", +, "b")))
            └─ eval_expr(Binary(...))
                  ├─ get("a") → 1
                  ├─ get("b") → 2
                  └─ 1 + 2   → 3
            └─ ControlFlow::Return(3)
   │
   └─ Returns Value::Int(3)

Tree-Walking vs Bytecode VM

Tree-Walking Bytecode VM (Python, JVM, V8)
How it works Recurse the AST directly Compile AST → bytecode, run a tight loop
Speed Slower Faster
Simplicity Very simple to build Much more complex
Good for Learning, scripting, DSLs Production languages

Deep Dive: The Environment (Where Values Live)

It's Just a HashMap

An environment is a HashMap that maps variable names to their values.

pub struct Environment {
    values: HashMap<String, Value>,   // "x" → Value::Int(42)
    parent: Option<Box<Environment>>, // pointer to outer scope
}

When you write let x = 42, it stores "x" → Value::Int(42) in this map. When you write x + 1, it looks up "x" in the map to get 42.

Where Values Are Actually Kept

Values live inside the HashMap, inside the Evaluator:

Evaluator
  └── environment: Environment
        └── values: HashMap
              ├── "x"    → Value::Int(42)
              ├── "name" → Value::String("Alice")
              └── "add"  → Value::Function { params, body, closure }
  • let x = 42 → evaluates 42 to Value::Int(42), stores it in the map
  • x + 1 → looks up "x" in the map, gets Value::Int(42), adds 1

Why the parent Pointer?

Because scopes nest. When you enter a block { } or call a function, a new environment is created pointing to the outer one:

Global env
  values: { "x": 10, "y": 20 }
       ↑ parent
Function env
  values: { "a": 1, "b": 2 }
       ↑ parent
If-block env
  values: { "tmp": 5 }

Looking up a variable walks the chain from inner to outer:

pub fn get(&self, name: &str) -> Result<Value, RuntimeError> {
    if let Some(value) = self.values.get(name) {
        Ok(value.clone())           // found in current scope
    } else if let Some(parent) = &self.parent {
        parent.get(name)            // not here, try parent
    } else {
        Err(RuntimeError::UndefinedVariable(...))  // not found anywhere
    }
}

So get("x") in the if-block above: not in if-block → not in function → found in global → returns Value::Int(10).

Concrete Example

let x = 10          // global env:  { x: 10 }
fn add(a, b) {      // global env:  { x: 10, add: <fn> }
    let tmp = a + b // call env:    { a: 3, b: 4, tmp: 7 }
    return tmp
}
add(3, 4)

Environment state during the call:

Global env:   { x: 10, add: Value::Function{...} }
     ↑ parent
Call env:     { a: 3, b: 4, tmp: 7 }

When add returns, the call env is discarded. Only global env survives.

One-Liner

The environment is where values live — a HashMap of name → value, chained so inner scopes can see outer ones.


Going Further

Once the basics work, you can add:

  • Standard library in the language itself — write stdlib functions in your own language, embed with include_str!()
  • String interpolation"Hello ${name}" — lex as StringInterp(Vec<Part>), eval each part
  • Try/catch/throw — a Thrown(String) error variant + TryCatch statement
  • Module systemimport foo resolves to a file, parses and executes it, returns a Module value
  • Dict/Map literals{"key": value}Dict(Vec<(Value, Value)>) in the value type
  • Method callsarray.push(x)eval_member_access dispatches on the type

Reference

  • Aether source: src/lexer/, src/parser/, src/interpreter/
  • Recommended book: Crafting Interpreters by Robert Nystrom (free online)
  • Aether total: ~5,000 lines Rust, 333 tests, built in ~15 hours

Here's a clear, side-by-side comparison of the 3 parsers:

Aspect ANTLR4 Tree-sitter Recursive Descent (Hand-written)
Parsing Strategy Adaptive LL(*) (ALL(*)) GLR-style (with incremental support) LL(1) / LL(k) / Pratt (top-down)
Type Generated parser Generated parser Usually hand-written
Direction Top-down Bottom-up with ambiguity handling Top-down
Lookahead Dynamic / arbitrary (adaptive) Arbitrary (GLR) Fixed or manual (LL(k) or Pratt)
Left Recursion Supported (auto-handled) Naturally supported Must be eliminated manually
Ambiguity Handling Good (via adaptive prediction) Excellent (explores multiple paths) Poor – you must avoid ambiguity
Incremental Parsing No Excellent (designed for it) No (unless you implement it)
Error Recovery Very good (customizable) Best-in-class (robust even on broken code) Good (but fully manual)
Output Full Parse Tree (or AST via visitor) Concrete Syntax Tree (CST) Usually directly builds AST
Performance Fast for batch processing Extremely fast on edits Excellent for full parses
Control & Flexibility High (listeners/visitors) Medium (queries + tree walking) Highest (full control)
Ease of Writing Grammar Easy to medium Medium (JavaScript grammar) Hardest (write everything by hand)
Debugging Good (clear generated code) Medium (GLR can be opaque) Best (plain readable code)
Error Messages Excellent & customizable Good Best (you control everything)
Learning Curve Medium Medium High (for complex languages)
Best For DSLs, compilers, translators, code gen Editors, syntax highlighting, tools Production compilers & interpreters

Detailed Breakdown

1. ANTLR4 (Adaptive LL(*))

  • Generates a recursive-descent-style parser, but with superpowers (dynamic lookahead).
  • You write a grammar → it generates clean parser code in many languages.
  • Very good balance between convenience and power.
  • You rarely write parsing logic by hand.
  • Ideal when: You want a maintainable grammar + good ecosystem + multi-language targets.

2. Tree-sitter

  • Uses a GLR-like algorithm optimized for incremental and error-tolerant parsing.
  • Produces a very detailed Concrete Syntax Tree.
  • Designed from the ground up for real-time editing (editors like Neovim, VS Code, Zed, etc.).
  • Ideal when: You need syntax-aware editor features, incremental analysis, or tools that work on incomplete/broken code.

3. Hand-written Recursive Descent

  • You write one function per grammar rule (parseStatement(), parseExpression(), etc.).
  • Most professional language implementations use this (Clang, Rustc, Go compiler, V8, etc.).
  • Usually combined with Pratt parser for expressions.
  • Ideal when: You need maximum performance, best diagnostics, or deep integration with your compiler/interpreter.

Quick Decision Guide

Your Goal Best Choice Why
Building a new programming language Recursive Descent or ANTLR4 Control + performance
Editor / IDE features (highlighting, etc.) Tree-sitter Incrementality + error tolerance
DSL or data format parser ANTLR4 Fast development, great visitors
Maximum performance compiler Hand-written Recursive Descent No overhead
Need both editor + compiler Tree-sitter (editor) + Recursive Descent / ANTLR4 (compiler) Best of both worlds
Rapid prototyping ANTLR4 Grammar → working parser quickly
Working with incomplete code Tree-sitter Graceful recovery

Summary

  • ANTLR4 = Powerful generated parser with great ergonomics.
  • Tree-sitter = Specialized for interactive, incremental use cases.
  • Recursive Descent = The "professional" manual approach used in real language toolchains.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment