Disclaimer: ChatGPT generated document.
In computer science, a grammar is a formal way of describing the structure of valid strings in a languageโmost commonly programming languages, data formats, and sometimes natural language processing systems. Grammars allow us to define what constitutes correct syntax, enabling everything from compilers and interpreters to parsers and protocol analyzers.
A grammar defines a formal language, which is a set of strings over some alphabet. The grammar lays out how these strings can be constructed.
Formally, a grammar G is a 4-tuple:
G = (V, ฮฃ, R, S) where:
- V โ set of variables (nonterminals)
- ฮฃ โ alphabet (terminals)
- R โ production rules
- S โ start symbol
Example:
S โ aSb
S โ ฮต
This grammar generates strings like "ab", "aabb", "aaabbb", etc.
- Terminals: literal characters โ cannot be expanded further (e.g.,
"while",+,{,}). - Nonterminals: placeholders โ must be replaced via production rules (e.g.,
Expression,Statement).
Rules define how symbols may be substituted. For example:
Expression โ Expression + Term | Term
Term โ Identifier | Number
| Type | Formal Name | Example Use | Restriction |
|---|---|---|---|
| 0 | Unrestricted | AI models, NLP | No limits |
| 1 | Context-sensitive | Some natural languages | Productions depend on context |
| 2 | Context-free (CFG) | Programming languages | Left side is just one nonterminal |
| 3 | Regular | Regex, lexical analysis | Very simple, often directly compiled |
๐ง In practice:
- Lexical analysis โ regular grammars (regex-based lexer)
- Syntax parsing โ context-free grammars (parser)
| Use Case | Explanation |
|---|---|
| Compiler design | Grammar defines syntax rules; parser uses it to build AST. |
| Interpreter | Uses grammar-driven parsing to execute code. |
| Language design | Defines how the language looks and behaves. |
| Data serialization formats | e.g., JSON, XML, SQL. |
| Security | Prevents injection attacks via well-defined grammar. |
Grammars directly control the parsing strategy.
| Parser Type | Grammar Type | Examples |
|---|---|---|
| LL(k) | Left-to-right, Leftmost derivation | ANTLR |
| LR(k) | Left-to-right, Rightmost derivation | yacc, bison |
| PEG | Parsing Expression Grammar | Some modern parsing systems |
A grammar is ambiguous if more than one parse tree exists for the same string. Ambiguity is undesirable in programming languages.
Example of ambiguous grammar:
E โ E + E | E * E | id
This doesnโt enforce operator precedenceโboth id + id * id and id * id + id can be parsed multiple ways. Instead, grammars enforce rules like:
E โ E + T | T
T โ T * F | F
F โ (E) | id
<expr> ::= <term> (("+" | "-") <term>)*
<term> ::= <factor> (("*" | "/") <factor>)*
<factor> ::= <number> | <identifier> | "(" <expr> ")"
This defines a valid arithmetic expression grammar.
| Feature | Grammar | Regex |
|---|---|---|
| Nested structures | โ๏ธ | โ |
| Hierarchical parsing | โ๏ธ | โ |
| Finite state (no recursion) | โ | โ๏ธ |
| Used in compilers | โ๏ธ | โ๏ธ (lexing only) |
| Can describe full programming languages | โ๏ธ | โ |
A grammar describes valid strings in a language and allows us to formally define syntax structure:
- It's a rule-based system (production rules).
- Used to parse, compile, and analyze code.
- Most programming languages use context-free grammars.
- Good grammars are unambiguous, well-defined, and reflect language semantics.
- Abstract vs. concrete syntax trees
- Grammar transformations (left recursion removal, factoring)
- PEG vs CFG
- Semantic analysis following parsing
- Using BNF/EBNF notation
- Automated parser generators (ANTLR, yacc, bison)
Let me know if you'd like:
- A deep dive on context-free grammars
- A tutorial on designing a grammar for a mini-language
- How grammars work in compiler frontends
- Examples in BNF, EBNF, or using ANTLR
๐ก Would you like to build a grammar for a simple language together?
