How to Build a Simple Programming Language

A step-by-step guide grounded in the Aether language — a complete tree-walking interpreter written in Rust.

For Beginners: Where to Start

Prerequisites

Before building a language, be comfortable with:

One programming language well — Python, JavaScript, or Java are fine. Rust is great but not required to start.
Recursion — parsers are deeply recursive. Practice this first if needed.
Basic data structures — hashmaps, enums, trees. You'll use all three constantly.

Best Learning Resource

Start with Crafting Interpreters by Robert Nystrom — free online at craftinginterpreters.com.

Exact Starting Path

Week 1–2: Read Crafting Interpreters (Part 1)

Walks you through building "Lox" — a small language, in Java or C
Part 1 (chapters 1–13) builds a tree-walking interpreter exactly like Aether
Do the challenges at the end of each chapter

Week 3: Build something tiny yourself

A calculator that handles 1 + 2 * 3 with correct precedence
Add variables: let x = 5
That's enough to understand the full pipeline

Week 4+: Extend it

Add if/else, then while, then functions
Each addition teaches you something new

Language to Use

Language	Recommendation
Java	Follow the book exactly — easiest start
Python	Great for experimenting, very little boilerplate
Rust	Harder, but what Aether uses — do Java first, then port
JavaScript	Works fine, widely familiar

Start in Java or Python. Don't start in Rust — the borrow checker will fight you while you're still learning the concepts.

The One Thing to Avoid

Don't try to design your language first. Beginners spend weeks on syntax decisions before writing a single line of code. Start building immediately — your design will naturally emerge from what you implement.

The Aether codebase is a good reference once you've finished Part 1 of Crafting Interpreters. By then you'll recognize every file and know exactly what each piece does.

The Big Picture

Building a language has 3 stages. Each transforms source code one step closer to execution.

"let x = 1 + 2"
       ↓  Lexer
[Let, Ident("x"), Equal, Int(1), Plus, Int(2), Eof]
       ↓  Parser
Stmt::Let("x", Expr::Binary(Int(1), Add, Int(2)))
       ↓  Interpreter
Environment: { x = 3 }

Architecture

Source code (text)
        │
        ▼
   ┌─────────┐
   │  Lexer  │  reads chars → emits tokens
   └────┬────┘
        │ Vec<Token>
        ▼
   ┌─────────┐
   │  Parser │  recursive descent → builds AST
   └────┬────┘
        │ Program (Vec<Stmt>)
        ▼
   ┌─────────────┐
   │ Interpreter │  walks AST → mutates Environment
   └──────┬──────┘
          │
    ┌─────┴───────┐
    │             │
 Environment   Value types
 (scoping)     (Int/String/Array/Function/...)

Build Order

Step	What	Why
1	Token types + Lexer	Foundation. Test with print statements
2	AST nodes	Just data structures, no logic yet
3	Parser (expressions)	`1 + 2 * 3` working is a milestone
4	Values + Environment	Runtime scaffolding
5	Evaluator (expressions)	Arithmetic, comparisons, variables
6	Statements: `let`, blocks, `if`	Now it's a real language
7	Functions + Closures	The hardest part
8	Loops, `return`, `break`	Control flow
9	Built-ins: `print`, `len`, `type`	Usability
10	REPL	Interactive testing loop

Step 1 — Token Types

Tokens are the atoms of your language — the smallest meaningful units.

pub enum TokenKind {
    // Literals carry their value
    Integer(i64),
    Float(f64),
    String(String),
    True, False, Null,

    // Variables and keywords
    Identifier(String),
    Let, Fn, Return, If, Else, While, For, Break, Continue,

    // Operators
    Plus, Minus, Star, Slash, Percent,
    Equal, EqualEqual, NotEqual,
    Less, Greater, LessEqual, GreaterEqual,
    And, Or, Not,

    // Delimiters
    LeftParen, RightParen, LeftBrace, RightBrace,
    Comma, Dot, Colon,

    Eof,  // always needed — signals end of input
}

// Wrap kind with position info for error messages
pub struct Token {
    pub kind: TokenKind,
    pub lexeme: String,   // raw text, e.g. "42"
    pub line: usize,
    pub column: usize,
}

Rule of thumb: Start minimal. Add tokens only as you add features.

Step 2 — The Lexer (Scanner)

The lexer reads source characters one by one and groups them into tokens.

Scanner State

pub struct Scanner {
    source: Vec<char>,  // source as char array for easy indexing
    tokens: Vec<Token>,
    start: usize,       // start position of current token
    current: usize,     // current character position
    line: usize,
    column: usize,
}

Main Loop

pub fn scan_tokens(&mut self) -> Result<Vec<Token>, LexerError> {
    while !self.is_at_end() {
        self.start = self.current;  // mark start of next token
        self.scan_token()?;
    }
    self.tokens.push(Token::new(TokenKind::Eof, ...));
    Ok(self.tokens.clone())
}

Dispatching on Characters

fn scan_token(&mut self) -> Result<(), LexerError> {
    let c = self.advance();
    match c {
        ' ' | '\r' | '\t' => {}       // skip whitespace
        '\n' => { self.line += 1; }
        '(' => self.add_token(TokenKind::LeftParen),
        '+' => {
            // look-ahead: '+' or '+='?
            let kind = if self.match_char('=') {
                TokenKind::PlusEqual
            } else {
                TokenKind::Plus
            };
            self.add_token(kind);
        }
        '/' => {
            if self.match_char('/') {
                // comment — skip to end of line
                while self.peek() != '\n' && !self.is_at_end() {
                    self.advance();
                }
            } else {
                self.add_token(TokenKind::Slash);
            }
        }
        '"' => self.scan_string()?,
        _ => {
            if c.is_ascii_digit()          { self.scan_number()?; }
            else if c.is_alphabetic()      { self.scan_identifier(); }
            else { return Err(LexerError::UnexpectedCharacter(c, ...)); }
        }
    }
    Ok(())
}

Three Essential Helper Methods

fn advance(&mut self) -> char {
    let c = self.source[self.current];
    self.current += 1;
    c
}

fn peek(&self) -> char {
    if self.is_at_end() { '\0' } else { self.source[self.current] }
}

// Consume only if next char matches — used for two-char tokens like ==
fn match_char(&mut self, expected: char) -> bool {
    if self.is_at_end() || self.source[self.current] != expected {
        return false;
    }
    self.current += 1;
    true
}

Identifiers and Keywords

fn scan_identifier(&mut self) {
    while self.peek().is_alphanumeric() || self.peek() == '_' {
        self.advance();
    }
    let text: String = self.source[self.start..self.current].iter().collect();
    let kind = match text.as_str() {
        "let"   => TokenKind::Let,
        "fn"    => TokenKind::Fn,
        "if"    => TokenKind::If,
        "else"  => TokenKind::Else,
        "while" => TokenKind::While,
        "true"  => TokenKind::True,
        "false" => TokenKind::False,
        "null"  => TokenKind::Null,
        _       => TokenKind::Identifier(text.clone()),
    };
    self.add_token(kind);
}

Step 3 — AST Design

The AST represents the grammatical structure of code as a tree. Two node kinds:

Expressions — produce a value when evaluated
Statements — perform an action

pub enum Expr {
    Integer(i64),
    Float(f64),
    String(String),
    Bool(bool),
    Null,
    Identifier(String),                          // variable lookup
    Binary(Box<Expr>, BinaryOp, Box<Expr>),      // 1 + 2, x == y
    Unary(UnaryOp, Box<Expr>),                   // -x, !flag
    Call(Box<Expr>, Vec<Expr>),                  // fn_name(arg1, arg2)
    Array(Vec<Expr>),                            // [1, 2, 3]
    Index(Box<Expr>, Box<Expr>),                 // arr[0]
    Member(Box<Expr>, String),                   // obj.field
    FunctionExpr(Vec<String>, Box<Stmt>),        // fn(x) { x + 1 }
}

pub enum Stmt {
    Expr(Expr),
    Let(String, Expr),                           // let x = 5
    Assign(Expr, Expr),                          // x = 10
    Block(Vec<Stmt>),                            // { ... }
    If(Expr, Box<Stmt>, Option<Box<Stmt>>),      // if (cond) { } else { }
    While(Expr, Box<Stmt>),                      // while (cond) { }
    For(String, Expr, Box<Stmt>),                // for x in arr { }
    Return(Option<Expr>),
    Break,
    Continue,
    Function(String, Vec<String>, Box<Stmt>),    // fn name(params) { body }
}

Why Box<T>? For recursive types (an Expr that contains another Expr). Rust requires boxing so the size is known at compile time.

Step 4 — The Parser

The parser consumes tokens and produces an AST using recursive descent — one function per grammar rule.

Parser State

pub struct Parser {
    tokens: Vec<Token>,
    current: usize,
}

Helper Methods (mirrors the lexer)

fn peek(&self) -> &Token { &self.tokens[self.current] }
fn advance(&mut self) -> &Token { self.current += 1; self.previous() }
fn check(&self, kind: &TokenKind) -> bool { /* discriminant match */ }

fn match_token(&mut self, kinds: &[TokenKind]) -> bool {
    // consume token if it matches any of the given kinds
}

fn consume(&mut self, kind: TokenKind, msg: &str) -> Result<&Token, ParseError> {
    if self.check(&kind) { Ok(self.advance()) }
    else { Err(ParseError::UnexpectedToken { expected: msg.to_string(), ... }) }
}

Top-Level Loop

pub fn parse(&mut self) -> Result<Program, ParseError> {
    let mut statements = Vec::new();
    while !self.is_at_end() {
        statements.push(self.declaration()?);
    }
    Ok(Program::new(statements))
}

Parsing a Statement

fn let_declaration(&mut self) -> Result<Stmt, ParseError> {
    // 'let' already consumed — expect: name = expr
    let name = /* read Identifier token */;
    self.advance();
    self.consume(TokenKind::Equal, "=")?;
    let initializer = self.expression()?;
    Ok(Stmt::Let(name, initializer))
}

Operator Precedence — The Key Challenge

Parse lower-precedence operators first; each level calls the one below for operands:

expression()      lowest: assignment
  or_expr()       ||
    and_expr()    &&
      equality()  == !=
        compare() < > <= >=
          add()   + -
            mul() * / %
              unary()   - !
                call()  f(), a[], a.b
                  primary()  literals, identifiers, (expr)

fn add_expr(&mut self) -> Result<Expr, ParseError> {
    let mut left = self.mul_expr()?;  // higher precedence first
    while self.match_token(&[TokenKind::Plus, TokenKind::Minus]) {
        let op = match self.previous().kind {
            TokenKind::Plus  => BinaryOp::Add,
            TokenKind::Minus => BinaryOp::Subtract,
            _ => unreachable!(),
        };
        let right = self.mul_expr()?;
        left = Expr::Binary(Box::new(left), op, Box::new(right));
    }
    Ok(left)
}

Step 5 — Runtime Values

Values are what expressions evaluate to.

pub enum Value {
    Int(i64),
    Float(f64),
    String(Rc<String>),    // Rc = cheap clones, automatic cleanup
    Bool(bool),
    Null,
    Array(Rc<Vec<Value>>),
    Function {
        params: Vec<String>,
        body: Box<Stmt>,
        closure: Box<Environment>,  // captured scope — how closures work
    },
    BuiltinFn {
        name: String,
        arity: usize,
        func: fn(&[Value]) -> Result<Value, RuntimeError>,
    },
}

Essential Methods

// Used in if/while conditions
pub fn is_truthy(&self) -> bool {
    match self {
        Value::Bool(b)   => *b,
        Value::Null      => false,
        Value::Int(0)    => false,
        Value::String(s) if s.is_empty() => false,
        _                => true,
    }
}

// Used in error messages
pub fn type_name(&self) -> &str {
    match self {
        Value::Int(_)    => "int",
        Value::String(_) => "string",
        Value::Array(_)  => "array",
        // ...
    }
}

Why Rc<T>? Without it, cloning a Value copies the entire array/string. With Rc, clone only increments a reference count — data is shared. Memory is freed automatically when the last reference drops.

Step 6 — Environment (Variable Scoping)

The environment maps variable names to values. Lexical scoping is implemented by chaining environments: each scope points to its parent.

pub struct Environment {
    values: HashMap<String, Value>,
    parent: Option<Box<Environment>>,  // None = global scope
}

impl Environment {
    pub fn new() -> Self { ... }  // global scope

    pub fn with_parent(parent: Environment) -> Self {
        Self { values: HashMap::new(), parent: Some(Box::new(parent)) }
    }

    pub fn define(&mut self, name: String, value: Value) {
        self.values.insert(name, value);  // always in current scope
    }

    pub fn get(&self, name: &str) -> Result<Value, RuntimeError> {
        if let Some(v) = self.values.get(name) {
            Ok(v.clone())
        } else if let Some(parent) = &self.parent {
            parent.get(name)  // walk up the chain
        } else {
            Err(RuntimeError::UndefinedVariable(name.to_string()))
        }
    }

    pub fn set(&mut self, name: &str, value: Value) -> Result<(), RuntimeError> {
        if self.values.contains_key(name) {
            self.values.insert(name.to_string(), value);
            Ok(())
        } else if let Some(parent) = &mut self.parent {
            parent.set(name, value)  // find in outer scope, update there
        } else {
            Err(RuntimeError::UndefinedVariable(name.to_string()))
        }
    }
}

Scoping Visualization

Global env:   { x: 1, greet: <fn> }
      ↑
Function env: { name: "Alice" }   ← params live here
      ↑
Block env:    { tmp: 5 }          ← block-scoped variables

get("x") walks up until found. get("missing") fails at the top with UndefinedVariable.

Step 7 — The Interpreter (Evaluator)

The evaluator walks the AST and executes it. Two main methods: eval_expr and exec_stmt.

Evaluator State

pub struct Evaluator {
    pub environment: Environment,
    call_depth: usize,       // current recursion depth
    max_call_depth: usize,   // limit (e.g. 100) to prevent stack overflow
}

Evaluating Expressions

pub fn eval_expr(&mut self, expr: &Expr) -> Result<Value, RuntimeError> {
    match expr {
        Expr::Integer(n)       => Ok(Value::Int(*n)),
        Expr::String(s)        => Ok(Value::String(Rc::new(s.clone()))),
        Expr::Bool(b)          => Ok(Value::Bool(*b)),
        Expr::Null             => Ok(Value::Null),

        Expr::Identifier(name) => self.environment.get(name),

        Expr::Array(elements)  => {
            let values: Vec<_> = elements
                .iter()
                .map(|e| self.eval_expr(e))
                .collect::<Result<_, _>>()?;
            Ok(Value::Array(Rc::new(values)))
        }

        Expr::Binary(left, op, right)  => self.eval_binary(left, *op, right),
        Expr::Call(callee, args)       => self.eval_call(callee, args),

        // Snapshot current env as closure
        Expr::FunctionExpr(params, body) => Ok(Value::Function {
            params: params.clone(),
            body: body.clone(),
            closure: Box::new(self.environment.clone()),
        }),
    }
}

Executing Statements — ControlFlow Enum

Non-local control flow (return, break, continue) is handled by propagating an enum up the call stack:

enum ControlFlow {
    None,            // keep going
    Return(Value),   // propagate return value up
    Break,           // exit loop
    Continue,        // skip to next iteration
}

fn exec_stmt(&mut self, stmt: &Stmt) -> Result<ControlFlow, RuntimeError> {
    match stmt {
        Stmt::Let(name, init) => {
            let value = self.eval_expr(init)?;
            self.environment.define(name.clone(), value);
            Ok(ControlFlow::None)
        }

        Stmt::Block(stmts) => {
            // Push a new scope
            let outer = self.environment.clone();
            self.environment = Environment::with_parent(outer);

            let mut result = ControlFlow::None;
            for s in stmts {
                result = self.exec_stmt(s)?;
                if result != ControlFlow::None { break; }  // propagate
            }

            // Pop scope
            self.environment = *self.environment.parent.take().unwrap();
            Ok(result)
        }

        Stmt::If(cond, then_b, else_b) => {
            if self.eval_expr(cond)?.is_truthy() {
                self.exec_stmt(then_b)
            } else if let Some(e) = else_b {
                self.exec_stmt(e)
            } else {
                Ok(ControlFlow::None)
            }
        }

        Stmt::While(cond, body) => {
            loop {
                if !self.eval_expr(cond)?.is_truthy() { break; }
                match self.exec_stmt(body)? {
                    ControlFlow::Break     => break,
                    ControlFlow::Return(v) => return Ok(ControlFlow::Return(v)),
                    _                      => {}
                }
            }
            Ok(ControlFlow::None)
        }

        Stmt::Return(expr) => {
            let val = match expr {
                Some(e) => self.eval_expr(e)?,
                None    => Value::Null,
            };
            Ok(ControlFlow::Return(val))
        }

        Stmt::Break    => Ok(ControlFlow::Break),
        Stmt::Continue => Ok(ControlFlow::Continue),
    }
}

Calling a Function

fn eval_call(&mut self, callee: &Expr, arg_exprs: &[Expr]) -> Result<Value, RuntimeError> {
    let func = self.eval_expr(callee)?;

    // Evaluate args in caller's scope
    let args: Vec<Value> = arg_exprs
        .iter()
        .map(|a| self.eval_expr(a))
        .collect::<Result<_, _>>()?;

    match func {
        Value::BuiltinFn { func, .. } => func(&args),

        Value::Function { params, body, closure } => {
            self.call_depth += 1;
            if self.call_depth > self.max_call_depth {
                return Err(RuntimeError::StackOverflow { ... });
            }

            // New scope rooted at CLOSURE (not caller!) — this is what makes closures work
            let mut call_env = Environment::with_parent(*closure);
            for (param, arg) in params.iter().zip(args.iter()) {
                call_env.define(param.clone(), arg.clone());
            }

            let outer = std::mem::replace(&mut self.environment, call_env);
            let result = self.exec_stmt(&body)?;
            self.environment = outer;

            self.call_depth -= 1;
            match result {
                ControlFlow::Return(v) => Ok(v),
                _                      => Ok(Value::Null),
            }
        }

        _ => Err(RuntimeError::InvalidOperation("not callable".to_string())),
    }
}

Closure key insight: When calling a function, its scope is rooted at closure (where it was defined), not the caller's scope. The function "remembers" its definition environment.

Step 8 — Built-in Functions

Built-ins are Rust functions registered in the global environment at startup.

Registration

fn register_builtins(&mut self) {
    self.environment.define("println".to_string(), Value::BuiltinFn {
        name: "println".to_string(),
        arity: usize::MAX,  // variadic
        func: builtin_println,
    });
    self.environment.define("len".to_string(), Value::BuiltinFn {
        name: "len".to_string(),
        arity: 1,
        func: builtin_len,
    });
}

Implementing a Built-in

pub fn builtin_len(args: &[Value]) -> Result<Value, RuntimeError> {
    match &args[0] {
        Value::Array(a)  => Ok(Value::Int(a.len() as i64)),
        Value::String(s) => Ok(Value::Int(s.len() as i64)),
        v => Err(RuntimeError::TypeError {
            expected: "array or string".to_string(),
            got: v.type_name().to_string(),
        }),
    }
}

pub fn builtin_println(args: &[Value]) -> Result<Value, RuntimeError> {
    let parts: Vec<String> = args.iter().map(|v| format!("{}", v)).collect();
    println!("{}", parts.join(" "));
    Ok(Value::Null)
}

Step 9 — Wire It All Together

fn run(source: &str) {
    // Stage 1: Lex
    let tokens = Scanner::new(source).scan_tokens().unwrap();

    // Stage 2: Parse
    let program = Parser::new(tokens).parse().unwrap();

    // Stage 3: Interpret
    let mut evaluator = Evaluator::new();
    for stmt in &program.statements {
        evaluator.exec_stmt(stmt).unwrap();
    }
}

fn main() {
    let args: Vec<String> = std::env::args().collect();
    if args.len() > 1 {
        let source = std::fs::read_to_string(&args[1]).unwrap();
        run(&source);
    } else {
        run_repl();  // interactive mode
    }
}

Error Types

Define errors at each stage with helpful messages:

// Lexer errors
pub enum LexerError {
    UnexpectedCharacter(char, usize, usize),
    UnterminatedString(usize, usize),
    InvalidNumber(String, usize, usize),
}

// Parser errors
pub enum ParseError {
    UnexpectedToken { expected: String, found: Token },
    UnexpectedEof,
    InvalidAssignmentTarget,
}

// Runtime errors
pub enum RuntimeError {
    UndefinedVariable(String),
    TypeError { expected: String, got: String },
    DivisionByZero,
    IndexOutOfBounds { index: i64, length: usize },
    InvalidOperation(String),
    ArityMismatch { expected: usize, got: usize },
    StackOverflow { depth: usize, limit: usize },
}

Key Concepts Summary

Concept	What it means
Lexer	Turns text into a flat list of typed tokens
Parser	Turns flat tokens into a nested AST tree
AST	Data structure representing code's grammatical structure
Environment	HashMap of name → value, chained for nested scopes
Closure	Function that captures its definition environment
ControlFlow enum	Signals return/break/continue up the call stack
`Rc<T>`	Reference-counted pointer for cheap cloning of heap values
Tree-walking	Directly eval the AST — simple, no compilation step

Deep Dive: How a Tree-Walking Interpreter Works

The name says it all — it walks the AST tree, visiting each node and executing it directly. No compilation, no bytecode, no translation step.

The Mental Model

Think of the AST as a blueprint of your code. The interpreter is a person with a calculator who walks the blueprint top to bottom, evaluating each piece on the spot.

For let x = 1 + 2 * 3:

         Stmt::Let("x")
               │
          Expr::Binary(+)
          ┌────┴────┐
       Int(1)   Expr::Binary(*)
                ┌────┴────┐
             Int(2)     Int(3)

The interpreter visits depth-first:

Reach Binary(*) → go deeper first
Evaluate Int(2) → 2
Evaluate Int(3) → 3
Apply * → 6
Back up: evaluate Int(1) → 1
Apply + → 7
Store 7 in environment as x

Two Recursive Functions Drive Everything

eval_expr()  ←──calls──→  exec_stmt()
     │                          │
     └── recurses into          └── recurses into
         child expressions          child statements

eval_expr takes an AST node, returns a Value:

pub fn eval_expr(&mut self, expr: &Expr) -> Result<Value, RuntimeError> {
    match expr {
        Expr::Integer(n)       => Ok(Value::Int(*n)),       // leaf: return immediately
        Expr::Identifier(name) => self.environment.get(name), // variable lookup

        Expr::Binary(left, op, right) => {
            let left_val  = self.eval_expr(left)?;    // recurse left
            let right_val = self.eval_expr(right)?;   // recurse right
            self.apply_op(op, left_val, right_val)    // combine
        }

        Expr::Call(callee, args) => self.eval_call(callee, args),
    }
}

exec_stmt takes a statement, returns a ControlFlow signal:

fn exec_stmt(&mut self, stmt: &Stmt) -> Result<ControlFlow, RuntimeError> {
    match stmt {
        Stmt::If(cond, then_branch, else_branch) => {
            if self.eval_expr(cond)?.is_truthy() {  // eval condition
                self.exec_stmt(then_branch)          // recurse into branch
            } else if let Some(e) = else_branch {
                self.exec_stmt(e)
            } else {
                Ok(ControlFlow::None)
            }
        }

        Stmt::Block(stmts) => {
            for s in stmts {
                let flow = self.exec_stmt(s)?;       // recurse each statement
                if flow != ControlFlow::None { return Ok(flow); }
            }
            Ok(ControlFlow::None)
        }
    }
}

Every node causes the interpreter to recurse deeper until it hits a leaf (a literal like Int(42)), then unwind back up combining results.

What Happens at Each Node

AST Node	What the interpreter does
`Int(42)`	Returns `Value::Int(42)` immediately (leaf)
`Identifier("x")`	Looks up `x` in the environment chain
`Binary(left, +, right)`	Recurse left, recurse right, add the results
`Unary(-, expr)`	Recurse into expr, negate the result
`Let("x", expr)`	Recurse into expr, store result in environment
`If(cond, then, else)`	Recurse cond, pick a branch to recurse into
`While(cond, body)`	Loop: recurse cond, if true recurse body, repeat
`Call(fn, args)`	Recurse each arg, create new scope, recurse body
`Block([s1, s2, ...])`	Recurse each statement in order

ControlFlow: How return/break/continue Work

These can't just return normally — they need to escape multiple levels of recursion. The ControlFlow enum bubbles up through exec_stmt calls:

enum ControlFlow {
    None,            // keep executing normally
    Return(Value),   // carry value up the call stack
    Break,           // exit the loop
    Continue,        // skip rest of loop body
}

exec_stmt(While)
  └─ exec_stmt(Block)
       └─ exec_stmt(If)
            └─ exec_stmt(Return(42))
                 └─ returns ControlFlow::Return(42)
            ← propagates up
       ← propagates up
  ← While sees Return, stops looping
← Function unwraps it: returns Value::Int(42)

Function Calls: Step by Step

Calling add(1, 2) where fn add(a, b) { return a + b }:

1. eval_expr(Call("add", [Int(1), Int(2)]))
   ├─ eval "add"  →  Value::Function{params, body, closure}
   ├─ eval Int(1) →  Value::Int(1)
   ├─ eval Int(2) →  Value::Int(2)
   │
   └─ Create new scope rooted at the closure:
         { a: 1, b: 2 }  ↑ parent: closure env
      │
      └─ exec_stmt(Return(Binary("a", +, "b")))
            └─ eval_expr(Binary(...))
                  ├─ get("a") → 1
                  ├─ get("b") → 2
                  └─ 1 + 2   → 3
            └─ ControlFlow::Return(3)
   │
   └─ Returns Value::Int(3)

Tree-Walking vs Bytecode VM

	Tree-Walking	Bytecode VM (Python, JVM, V8)
How it works	Recurse the AST directly	Compile AST → bytecode, run a tight loop
Speed	Slower	Faster
Simplicity	Very simple to build	Much more complex
Good for	Learning, scripting, DSLs	Production languages

Deep Dive: The Environment (Where Values Live)

It's Just a HashMap

An environment is a HashMap that maps variable names to their values.

pub struct Environment {
    values: HashMap<String, Value>,   // "x" → Value::Int(42)
    parent: Option<Box<Environment>>, // pointer to outer scope
}

When you write let x = 42, it stores "x" → Value::Int(42) in this map. When you write x + 1, it looks up "x" in the map to get 42.

Where Values Are Actually Kept

Values live inside the HashMap, inside the Evaluator:

Evaluator
  └── environment: Environment
        └── values: HashMap
              ├── "x"    → Value::Int(42)
              ├── "name" → Value::String("Alice")
              └── "add"  → Value::Function { params, body, closure }

let x = 42 → evaluates 42 to Value::Int(42), stores it in the map
x + 1 → looks up "x" in the map, gets Value::Int(42), adds 1

Why the `parent` Pointer?

Because scopes nest. When you enter a block { } or call a function, a new environment is created pointing to the outer one:

Global env
  values: { "x": 10, "y": 20 }
       ↑ parent
Function env
  values: { "a": 1, "b": 2 }
       ↑ parent
If-block env
  values: { "tmp": 5 }

Looking up a variable walks the chain from inner to outer:

pub fn get(&self, name: &str) -> Result<Value, RuntimeError> {
    if let Some(value) = self.values.get(name) {
        Ok(value.clone())           // found in current scope
    } else if let Some(parent) = &self.parent {
        parent.get(name)            // not here, try parent
    } else {
        Err(RuntimeError::UndefinedVariable(...))  // not found anywhere
    }
}

So get("x") in the if-block above: not in if-block → not in function → found in global → returns Value::Int(10).

Concrete Example

let x = 10          // global env:  { x: 10 }
fn add(a, b) {      // global env:  { x: 10, add: <fn> }
    let tmp = a + b // call env:    { a: 3, b: 4, tmp: 7 }
    return tmp
}
add(3, 4)

Environment state during the call:

Global env:   { x: 10, add: Value::Function{...} }
     ↑ parent
Call env:     { a: 3, b: 4, tmp: 7 }

When add returns, the call env is discarded. Only global env survives.

One-Liner

The environment is where values live — a HashMap of name → value, chained so inner scopes can see outer ones.

Going Further

Once the basics work, you can add:

Standard library in the language itself — write stdlib functions in your own language, embed with include_str!()
String interpolation — "Hello ${name}" — lex as StringInterp(Vec<Part>), eval each part
Try/catch/throw — a Thrown(String) error variant + TryCatch statement
Module system — import foo resolves to a file, parses and executes it, returns a Module value
Dict/Map literals — {"key": value} — Dict(Vec<(Value, Value)>) in the value type
Method calls — array.push(x) — eval_member_access dispatches on the type

Reference

Aether source: src/lexer/, src/parser/, src/interpreter/
Recommended book: Crafting Interpreters by Robert Nystrom (free online)
Aether total: ~5,000 lines Rust, 333 tests, built in ~15 hours

Aspect	ANTLR4	Tree-sitter	Recursive Descent (Hand-written)
Parsing Strategy	*Adaptive LL()** (ALL(*))	GLR-style (with incremental support)	LL(1) / LL(k) / Pratt (top-down)
Type	Generated parser	Generated parser	Usually hand-written
Direction	Top-down	Bottom-up with ambiguity handling	Top-down
Lookahead	Dynamic / arbitrary (adaptive)	Arbitrary (GLR)	Fixed or manual (LL(k) or Pratt)
Left Recursion	Supported (auto-handled)	Naturally supported	Must be eliminated manually
Ambiguity Handling	Good (via adaptive prediction)	Excellent (explores multiple paths)	Poor – you must avoid ambiguity
Incremental Parsing	No	Excellent (designed for it)	No (unless you implement it)
Error Recovery	Very good (customizable)	Best-in-class (robust even on broken code)	Good (but fully manual)
Output	Full Parse Tree (or AST via visitor)	Concrete Syntax Tree (CST)	Usually directly builds AST
Performance	Fast for batch processing	Extremely fast on edits	Excellent for full parses
Control & Flexibility	High (listeners/visitors)	Medium (queries + tree walking)	Highest (full control)
Ease of Writing Grammar	Easy to medium	Medium (JavaScript grammar)	Hardest (write everything by hand)
Debugging	Good (clear generated code)	Medium (GLR can be opaque)	Best (plain readable code)
Error Messages	Excellent & customizable	Good	Best (you control everything)
Learning Curve	Medium	Medium	High (for complex languages)
Best For	DSLs, compilers, translators, code gen	Editors, syntax highlighting, tools	Production compilers & interpreters

Your Goal	Best Choice	Why
Building a new programming language	Recursive Descent or ANTLR4	Control + performance
Editor / IDE features (highlighting, etc.)	Tree-sitter	Incrementality + error tolerance
DSL or data format parser	ANTLR4	Fast development, great visitors
Maximum performance compiler	Hand-written Recursive Descent	No overhead
Need both editor + compiler	Tree-sitter (editor) + Recursive Descent / ANTLR4 (compiler)	Best of both worlds
Rapid prototyping	ANTLR4	Grammar → working parser quickly
Working with incomplete code	Tree-sitter	Graceful recovery

valarpirai/Build-A-Programming-Language.md