My algorithm to convert a Regular Expression to a Regular Grammar is based on mostly two online resources.
Other similar online resources on the same topic may be found at:
- Constructing an Equivalent Regular Grammar from a Regular Expression
- Algorithm to convert regular expression to linear grammar
Unlike all the algorithm descriptions referenced above this algorithm is readily convertible to a computer program by choosing appropriate data structures for the different constructs in the algorithm. I am not using any stricter mathematical formalism to describe the algorithm; only the constructs that help me explain the algorithm in a easy to understand manner are used.
- TS is set of Terminal Symbols and s, s1, s', etc. represents terminal symbols.
- NS is set of Non-terminal Symbols and S, S1, S', etc. represents non-terminal symbols.
- NSt is run-time stack of non-terminal symbols.
- S1 is the start (i.e. top level) non-terminal symbol.
- CPr is collection of all Production Rules.
- CPr(S) is an ordered set of all production rules for non-terminal S.
- CPr(S)(n) is the recent production rule added for S, i.e.
CPr(S)(n)
belongs toCPr(S)
wheren = |CPr(S)|
. - A regular expression is scanned from left to right consuming one input at a time. Each input is either a member of TS or a member of the set of operators
{concat, |, *, +}
or a member of{(, )}
. - At the start current operator defaults to concat.
- Before start scanning a regular expression S1 is pushed into NSt.
- Loop through the following steps till end of regular expression.
- If an operator is scanned set Current Operator.
- Let current operator is + and S is top of NSt.
- Let
CPr(S)(n) = { S -> [String over TS and NS]Z }
, where Z belongs either to TS or NS. - Let S' be a new non-terminal, then replace Z by S', that is
CPr(S)(n) = { S -> [String over TS and NS]S' }
andCPr(S') = {1. S' -> Z, 2. S' -> ZS'}
. - Change current operator to concat. Start of loop.
- Let current operator is * and S is top of NSt.
- Let
CPr(S)(n) = { S -> [String over TS and NS]Z }
, where Z belongs either to TS or NS. - Let S' be a new non-terminal, then replace Z by S', that is
CPr(S)(n) = { S -> [String over TS and NS]S' }
andCPr(S') = {1. S' -> null, 2. S' -> ZS'}
. - Change current operator to concat. Start of loop.
- If a terminal symbol is scanned set Current Terminal.
- Let current terminal is s, current operator concat and S is top of NSt.
- If CPr(S) is null set then add
{ S -> s }
to CPr(S) and soCPr(S)(n) = { S -> s }, n = 1
. - If CPr(S) is not null then concat s at the right most end of all the RHS of the productions.
- Start of loop.
- Let current terminal is s, current operator | and S is top of NSt.
- Add
{ S -> s }
to CPr(S) and soCPr(S)(n) = { S -> s }, n = nprev + 1
. - Change current operator to concat. Start of loop.
- If ( is scanned from the regular expression.
- Let current operator is concat and S is top of NSt.
- Let S' be a new non-terminal.
- If CPr(S) is null set then add
{ S -> S' }
to CPr(S) and soCPr(S)(n) = { S -> S' }, n = 1
. - If CPr(S) is not null then concat S' at the right most end of all the RHS of the productions.
- Push S' to NSt. Start of loop.
- Let current operator is | and S is top of NSt.
- Let S' be a new non-terminal.
- Add
{ S -> S' }
to CPr(S) and soCPr(S)(n) = { S -> S' }, n = nprev + 1
. - Push S' to NSt.
- Change current operator to concat. Start of loop.
- if ) is scanned from the regular expression then pop S from the top of NSt. Start of loop.
- Reduction/optimization of production rules.
- Order the production rules according to LHS from top-most to the lowest level. Start from the lowest level production.
- Let
S -> [String over TS]S'[String over TS and NS]
is the current reduction candidate where S' is the current unchecked non-terminal. - If S' does not appear in any of the RHS of its production rules then for each
{ S' -> A }
belonging to CPr(S'), replaceS -> [String over TS]S'[String over TS and NS]
byS -> [String over TS]A[String over TS and NS]
. Apply this rule to each S' in production rules from lowest to top-most. - Let S be the start symbol which has only one RHS of the form
S -> S'[String over TS and NS]
, where S' is also a non-terminal. - If S' has at least one RHS whose right-most symbol is not S' then set S' as start symbol S, and
RHS(S) = RHS(S': S' is also the right-most symbol) union { w[String over TS and NS] : w belongs to RHS(S': S' is not right-most symbol)}
.
A formal time complexity and memory complexity analysis of the above algorithm is yet to be done.