Skip to content

Instantly share code, notes, and snippets.

@MangaD
Created November 19, 2025 06:29
Show Gist options
  • Select an option

  • Save MangaD/388d0629102f57e35f24231ee71b74c5 to your computer and use it in GitHub Desktop.

Select an option

Save MangaD/388d0629102f57e35f24231ee71b74c5 to your computer and use it in GitHub Desktop.
Grammars in Computer Science

Grammars in Computer Science

CC0

Disclaimer: ChatGPT generated document.

In computer science, a grammar is a formal way of describing the structure of valid strings in a languageโ€”most commonly programming languages, data formats, and sometimes natural language processing systems. Grammars allow us to define what constitutes correct syntax, enabling everything from compilers and interpreters to parsers and protocol analyzers.


๐Ÿ”น Core Concepts

1. Formal Language

A grammar defines a formal language, which is a set of strings over some alphabet. The grammar lays out how these strings can be constructed.

2. Grammar Components

Formally, a grammar G is a 4-tuple:

G = (V, ฮฃ, R, S) where:

  • V โ†’ set of variables (nonterminals)
  • ฮฃ โ†’ alphabet (terminals)
  • R โ†’ production rules
  • S โ†’ start symbol

Example:

S โ†’ aSb
S โ†’ ฮต

This grammar generates strings like "ab", "aabb", "aaabbb", etc.

3. Terminals & Nonterminals

  • Terminals: literal characters โ†’ cannot be expanded further (e.g., "while", +, {, }).
  • Nonterminals: placeholders โ†’ must be replaced via production rules (e.g., Expression, Statement).

4. Production Rules

Rules define how symbols may be substituted. For example:

Expression โ†’ Expression + Term | Term
Term       โ†’ Identifier | Number

๐Ÿ”น Types of Grammars (Chomsky Hierarchy)

Type Formal Name Example Use Restriction
0 Unrestricted AI models, NLP No limits
1 Context-sensitive Some natural languages Productions depend on context
2 Context-free (CFG) Programming languages Left side is just one nonterminal
3 Regular Regex, lexical analysis Very simple, often directly compiled

๐Ÿง  In practice:

  • Lexical analysis โ†’ regular grammars (regex-based lexer)
  • Syntax parsing โ†’ context-free grammars (parser)

๐Ÿ”น Why Grammars Matter

Use Case Explanation
Compiler design Grammar defines syntax rules; parser uses it to build AST.
Interpreter Uses grammar-driven parsing to execute code.
Language design Defines how the language looks and behaves.
Data serialization formats e.g., JSON, XML, SQL.
Security Prevents injection attacks via well-defined grammar.

๐Ÿ”น Parsing

Grammars directly control the parsing strategy.

Parser Type Grammar Type Examples
LL(k) Left-to-right, Leftmost derivation ANTLR
LR(k) Left-to-right, Rightmost derivation yacc, bison
PEG Parsing Expression Grammar Some modern parsing systems

๐Ÿ”น Ambiguity

A grammar is ambiguous if more than one parse tree exists for the same string. Ambiguity is undesirable in programming languages.

Example of ambiguous grammar:

E โ†’ E + E | E * E | id

This doesnโ€™t enforce operator precedenceโ€”both id + id * id and id * id + id can be parsed multiple ways. Instead, grammars enforce rules like:

E  โ†’ E + T | T
T  โ†’ T * F | F
F  โ†’ (E) | id

๐Ÿ”น Example โ€“ Tiny Expression Grammar

<expr>   ::= <term> (("+" | "-") <term>)*
<term>   ::= <factor> (("*" | "/") <factor>)*
<factor> ::= <number> | <identifier> | "(" <expr> ")"

This defines a valid arithmetic expression grammar.


๐Ÿ”น Grammar vs. Regex

Feature Grammar Regex
Nested structures โœ”๏ธ โŒ
Hierarchical parsing โœ”๏ธ โŒ
Finite state (no recursion) โŒ โœ”๏ธ
Used in compilers โœ”๏ธ โœ”๏ธ (lexing only)
Can describe full programming languages โœ”๏ธ โŒ

๐Ÿ”น Summary

A grammar describes valid strings in a language and allows us to formally define syntax structure:

  • It's a rule-based system (production rules).
  • Used to parse, compile, and analyze code.
  • Most programming languages use context-free grammars.
  • Good grammars are unambiguous, well-defined, and reflect language semantics.

๐Ÿ“š Further Topics (if you're interested)

  • Abstract vs. concrete syntax trees
  • Grammar transformations (left recursion removal, factoring)
  • PEG vs CFG
  • Semantic analysis following parsing
  • Using BNF/EBNF notation
  • Automated parser generators (ANTLR, yacc, bison)

Let me know if you'd like:

  • A deep dive on context-free grammars
  • A tutorial on designing a grammar for a mini-language
  • How grammars work in compiler frontends
  • Examples in BNF, EBNF, or using ANTLR

๐Ÿ’ก Would you like to build a grammar for a simple language together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment