Skip to content

Instantly share code, notes, and snippets.

@huzaifaarain
Created October 23, 2019 10:57
Show Gist options
  • Save huzaifaarain/f313a0133c972b41fc8fa5def0ce5ef3 to your computer and use it in GitHub Desktop.
Save huzaifaarain/f313a0133c972b41fc8fa5def0ce5ef3 to your computer and use it in GitHub Desktop.
Chomsky Normal Form (CNF) Definition 2

CSE 322 - Introduction to Formal Methods in Computer Science

Chomsky Normal Form

Dave Bacon
Department of Computer Science & Engineering, University of Washington

A useful form for dealing with context free grammars is the Chomksy normal form. This is a particular form of writing a CFG which is useful for understanding CFGs and for proving things about them. It also makes the parse tree for derivations using this form of the CFG a binary tree. And as a CS major, I know you really love binary trees! So what is Chomsky normal form? A CFG is in Chomsky normal form when every rule is of the formA→BC andA→a, whereais a terminal, andA,B, andCare variables. FurtherBandCare not the start variable. Additionally we permit the ruleS→εwhereSis the start variable, for technical reasons. Note that this means that we allowS→εas one of many possible rules. Okay, so if this is the Chomsky normal form what is it good for? Well as a first fact, note that parse trees for a derivation using such a grammar will be a binary tree. Thats nice. It will help us down the road. Okay, so if it might be good for something, we can ask the natural question: is it possible to convert an arbitrary CFG into an equivalent grammar which is of the Chomsky normal form? The answer, it turns out, is yes. Lets see how such a conversion would proceed.

A. A new start variable

The first step is simple! We just add a new start variableS 0 and the ruleS 0 →SwhereSis the original start variable. By doing this we guarantee that the start variable doesn’t occur on the right hand side of a rule.

B. Eliminate the ε rules

Next we remove theεrule. We do this as follows. Suppose we are removing theεruleA→ε. We remove this rule. But now we have to “fix” the rules which have anAon their right-hand side. We do this by, for each occurrence ofA on the right hand side, adding a rule (from the same starting variable) which has theAremoved. Further ifAis the only thing occurring on the right hand side, we replace thisAwithε. Of course this latter fact will have created a newεrule. So we do thisunless we have previously removed A→ε. But onward we press: simply repeat the above process over and over again until allεrules have been removed. For example, suppose our rules contain the ruleA→εand the ruleB→uAvwhereuandvare not both the empty string. First we removeA→ε. Then we add to this rule the ruleB→uv. (Make sure that you don’t delete the original ruleB→uAv. If, on the other hand we had the ruleA→εandB→A, then we would remove the A→εand replace the ruleB→Awith the ruleB→ε. Of course we now have to eliminate this rule via the same procedure.

C. Remove the unit rules

Next we need to remove the unit rules. If we have the ruleA→B, then whenever the ruleB→uappears, we will add the ruleA→u(unless this rule was already replaced.) Again we do this repeatedly until we eliminate all unit rules.

D. Take care of rules with more than two terminals or variables

At this point we have converted our CFG to one which has noεtransitions, and where all rules are either of the form variables goes to terminal, or of the form variable goes to string of variables and terminals with two or more symbols. These later rules are of the appropriate Chomsky normal form. To convert the remaining rules to proper form, we introduce extra variables. In particular supposeA→u 1 u 2... unwheren >2. Then we convert this to a set of rules,A→u 1 A 1 ,A 1 →u 2 A 2 ,.. .,Ak− 2 →uk− 1 uk. Now we need to take care of the rules with two elements on the right hand side. If both of the elements are variables, then we are fine. But if any of them are terminals, we

add a new variable and a new rule to take care of these. For example, if we haveA→u 1 Bwhereu 1 is a terminal, then we replace this byA→U 1 BandU→u 1.

I. EXAMPLE CONVERSION TO CHOMSKY NORMAL FORM
Lets work out an example. Consider the grammar
S → ASB
A → aAS|a|ε
B → SbS|A|bb

First we add a new start state:

S 0 → S
S → ASB
A → aAS|a|ε
B → SbS|A|bb

Next we need to eliminate theεrules. EliminatingA→εyields

S 0 → S
S → ASB|SB
A → aAS|a|aS
B → SbS|A|bb|ε

Now we have a newεrule.,B→ε. Lets remove it

S 0 → S
S → ASB|SB|S|AS
A → aAS|a|aS
B → SbS|A|bb

Next we need to remove all unit rules. Lets begin by removingB→A:

S 0 → S
S → ASB|SB|S|AS
A → aAS|a|aS
B → SbS|bb|aAS|a|aS

Next lets removeS→S:

S 0 → S
S → ASB|SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS

Further we can eliminateS 0 →S:

S 0 → ASB|SB|AS
S → ASB|SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS

Now we need to take care of the rules with more than three symbols. First replaceS 0 →ASBbyS 0 →AU 1 and U 1 →SB:

S 0 → AU 1 |SB|AS
S → ASB|SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS
U 1 → SB

Next eliminateS→ASBin a similar form (technically we could reuseU 1 , but lets not):

S 0 → AU 1 |SB|AS
S → AU 2 |SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS
U 1 → SB
U 2 → SB

Onward and upward, now fixA→aASby introducingA→aU 3 andU 3 →AS.

S 0 → AU 1 |SB|AS
S → AU 2 |SB|AS
A → aU 3 |a|aS
B → SbS|bb|aAs|a|aS
U 1 → SB
U 2 → SB
U 3 → AS

Finally, fix the twoB→rules:

S 0 → AU 1 |SB|AS
S → AU 2 |SB|AS
A → aU 3 |a|aS
B → SU 4 |bb|aU 5 |a|aS
U 1 → SB
U 2 → SB
U 3 → AS
U 4 → bS
U 5 → AS

Finally we need to work with the rules which have terminals and variables or two terminals. We need to introduce new variables for these. Let these beV 1 →aandV 2 →b:

S 0 → AU 1 |SB|AS
S → AU 2 |SB|AS
A → V 1 U 3 |a|V 1 S
B → SU 4 |V 2 V 2 |V 1 U 5 |a|V 1 S
U 1 → SB
U 2 → SB
U 3 → AS
U 4 → V 2 S
U 5 → AS
V 1 → a
V 2 → b

A quick examination shows us that we have ended up with a grammar in Chomsky normal form. (This can, of course, be simplified.)

CITATIONS

CSE 322 - Introduction to Formal Methods in Computer Science (Department of Computer Science & Engineering, University of Washington)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment