Some References to Algorithm Resources

My algorithm to convert a Regular Expression to a Regular Grammar is based on mostly two online resources.

Other similar online resources on the same topic may be found at:

Algorithm to build Regular Grammar for a Regular Expression

Unlike all the algorithm descriptions referenced above this algorithm is readily convertible to a computer program by choosing appropriate data structures for the different constructs in the algorithm. I am not using any stricter mathematical formalism to describe the algorithm; only the constructs that help me explain the algorithm in a easy to understand manner are used.

TS is set of Terminal Symbols and s, s1, s', etc. represents terminal symbols.
NS is set of Non-terminal Symbols and S, S1, S', etc. represents non-terminal symbols.
NSt is run-time stack of non-terminal symbols.
S1 is the start (i.e. top level) non-terminal symbol.
CPr is collection of all Production Rules.
CPr(S) is an ordered set of all production rules for non-terminal S.
CPr(S)(n) is the recent production rule added for S, i.e. CPr(S)(n) belongs to CPr(S) where n = |CPr(S)|.
A regular expression is scanned from left to right consuming one input at a time. Each input is either a member of TS or a member of the set of operators {concat, |, *, +} or a member of {(, )}.
At the start current operator defaults to concat.
Before start scanning a regular expression S1 is pushed into NSt.
Loop through the following steps till end of regular expression.
If an operator is scanned set Current Operator.
1. Let current operator is + and S is top of NSt.
2. Let CPr(S)(n) = { S -> [String over TS and NS]Z }, where Z belongs either to TS or NS.
3. Let S' be a new non-terminal, then replace Z by S', that is CPr(S)(n) = { S -> [String over TS and NS]S' } and CPr(S') = {1. S' -> Z, 2. S' -> ZS'}.
4. Change current operator to concat. Start of loop.
5. Let current operator is * and S is top of NSt.
6. Let CPr(S)(n) = { S -> [String over TS and NS]Z }, where Z belongs either to TS or NS.
7. Let S' be a new non-terminal, then replace Z by S', that is CPr(S)(n) = { S -> [String over TS and NS]S' } and CPr(S') = {1. S' -> null, 2. S' -> ZS'}.
8. Change current operator to concat. Start of loop.
If a terminal symbol is scanned set Current Terminal.
1. Let current terminal is s, current operator concat and S is top of NSt.
2. If CPr(S) is null set then add { S -> s } to CPr(S) and so CPr(S)(n) = { S -> s }, n = 1.
3. If CPr(S) is not null then concat s at the right most end of all the RHS of the productions.
4. Start of loop.
5. Let current terminal is s, current operator | and S is top of NSt.
6. Add { S -> s } to CPr(S) and so CPr(S)(n) = { S -> s }, n = nprev + 1.
7. Change current operator to concat. Start of loop.
If ( is scanned from the regular expression.
1. Let current operator is concat and S is top of NSt.
2. Let S' be a new non-terminal.
3. If CPr(S) is null set then add { S -> S' } to CPr(S) and so CPr(S)(n) = { S -> S' }, n = 1.
4. If CPr(S) is not null then concat S' at the right most end of all the RHS of the productions.
5. Push S' to NSt. Start of loop.
6. Let current operator is | and S is top of NSt.
7. Let S' be a new non-terminal.
8. Add { S -> S' } to CPr(S) and so CPr(S)(n) = { S -> S' }, n = nprev + 1.
9. Push S' to NSt.
10. Change current operator to concat. Start of loop.
if ) is scanned from the regular expression then pop S from the top of NSt. Start of loop.
Reduction/optimization of production rules.
1. Order the production rules according to LHS from top-most to the lowest level. Start from the lowest level production.
2. Let S -> [String over TS]S'[String over TS and NS] is the current reduction candidate where S' is the current unchecked non-terminal.
3. If S' does not appear in any of the RHS of its production rules then for each { S' -> A } belonging to CPr(S'), replace S -> [String over TS]S'[String over TS and NS] by S -> [String over TS]A[String over TS and NS]. Apply this rule to each S' in production rules from lowest to top-most.
4. Let S be the start symbol which has only one RHS of the form S -> S'[String over TS and NS], where S' is also a non-terminal.
5. If S' has at least one RHS whose right-most symbol is not S' then set S' as start symbol S, and RHS(S) = RHS(S': S' is also the right-most symbol) union { w[String over TS and NS] : w belongs to RHS(S': S' is not right-most symbol)}.

Analysis of the algorithm

A formal time complexity and memory complexity analysis of the above algorithm is yet to be done.

santanuchakrabarti/RE_to_Grammar_Algorithm.md

Some References to Algorithm Resources

Algorithm to build Regular Grammar for a Regular Expression

Analysis of the algorithm