Jangwa · January 4, 2016 19:49
diff --git a/KMP.cpp b/KMP.cpp
 The Naive pattern searching algorithm doesn’t work well in cases where we see many matching characters followed by 
 a mismatching character. Following are some examples.

   txt[] = "AAAAAAAAAAAAAAAAAB"
   txt[] = "ABABABCABABABCABABABC"
   lps[i] = the longest proper prefix of pat[0..i] Searching Algorithm:Preprocessing Algorithm:     
   if (pattern[0] == '\0')     /* Construct the lookup table */     /* Perform the search */     free(T);
   
   pat[] = "AAAAB"

   pat[] =  "ABABAC" (not a worst case, but a bad case for Naive)
              which is also a suffix of pat[0..i].
 Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the next sliding position. 
 Let us see how we do that. When we compare pat[j] with txt[i] and see a mismatch, we know that characters pat[0..j-1] match
 with txt[i-j+1...i-1], and we also know that lps[j-1] characters of pat[0...j-1] are both proper prefix and suffix which
 means we do not need to match these lps[j-1] characters with txt[i-j...i-1] because we know that these characters will 
 anyway match. See KMPSearch() in the below code for details. In the preprocessing part, we calculate values in lps[]. To do
 that, we keep track of the length of the longest prefix suffix value (we use len variable for this purpose) for the previous

 The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the
 pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever 
 we detect a mismatch (after some matches), we already know some of the characters in the text (since they matched the pattern
 characters prior to the mismatch). We take advantage of this information to avoid matching the characters that we know will 
 anyway match. KMP algorithm does some preprocessing over the pattern pat[] and constructs an auxiliary array lps[] of size m 
 (same as size of pattern). For each sub-pattern pat[0…i] where i = 0 to m-1, it calculates the length of maximum matching 
 proper prefix which is also a suffix of the sub-pattern pat[0..i] and stores the value in lps[i].

 Examples:
 For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
 For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]
 For the pattern “AAAAA”, lps[] is [0, 1, 2, 3, 4]
 For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]
 For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]


 ------------------------------------------------------------------------------------------------------
 #define MAX_N 100010
 char T[MAX_N], P[MAX_N]; // T = text, P = pattern
 int b[MAX_N], n, m; // b = back table, n = length of T, m = length of P
 void kmpPreprocess()
 {
  // call this before calling kmpSearch()
 int i = 0, j = -1; b[0] = -1; // starting values
 while (i < m)
 { // pre-process the pattern string P
  while (j >= 0 && P[i] != P[j])
   j = b[j]; // if different, reset j using b
  i++;j++; // if same, advance both pointers
  b[i] = j;

 }
 }
 void kmpSearch()
 { // this is similar as kmpPreprocess(), but on string T
 int i = 0, j = 0; // starting values
 while (i < n)
 { // search through string T
  while (j >= 0 && T[i] != P[j])
  j = b[j]; // if different, reset j using b
  i++; j++; // if same, advance both pointers
  if (j == m)
  { // a match found when j == m
   printf("P is found at index %d in T\n", i - j);
   j = b[j]; // prepare j for the next possible match
  }
 }
 }
	The Naive pattern searching algorithm doesn’t work well in cases where we see many matching characters followed by
	a mismatching character. Following are some examples.

	txt[] = "AAAAAAAAAAAAAAAAAB"
	txt[] = "ABABABCABABABCABABABC"
	lps[i] = the longest proper prefix of pat[0..i] Searching Algorithm:Preprocessing Algorithm:
	if (pattern[0] == '\0') /* Construct the lookup table / / Perform the search */ free(T);

	pat[] = "AAAAB"

	pat[] = "ABABAC" (not a worst case, but a bad case for Naive)
	which is also a suffix of pat[0..i].
	Unlike the Naive algo where we slide the pattern by one, we use a value from lps[] to decide the next sliding position.
	Let us see how we do that. When we compare pat[j] with txt[i] and see a mismatch, we know that characters pat[0..j-1] match
	with txt[i-j+1...i-1], and we also know that lps[j-1] characters of pat[0...j-1] are both proper prefix and suffix which
	means we do not need to match these lps[j-1] characters with txt[i-j...i-1] because we know that these characters will
	anyway match. See KMPSearch() in the below code for details. In the preprocessing part, we calculate values in lps[]. To do
	that, we keep track of the length of the longest prefix suffix value (we use len variable for this purpose) for the previous

	The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the
	pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever
	we detect a mismatch (after some matches), we already know some of the characters in the text (since they matched the pattern
	characters prior to the mismatch). We take advantage of this information to avoid matching the characters that we know will
	anyway match. KMP algorithm does some preprocessing over the pattern pat[] and constructs an auxiliary array lps[] of size m
	(same as size of pattern). For each sub-pattern pat[0…i] where i = 0 to m-1, it calculates the length of maximum matching
	proper prefix which is also a suffix of the sub-pattern pat[0..i] and stores the value in lps[i].

	Examples:
	For the pattern “AABAACAABAA”, lps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]
	For the pattern “ABCDE”, lps[] is [0, 0, 0, 0, 0]
	For the pattern “AAAAA”, lps[] is [0, 1, 2, 3, 4]
	For the pattern “AAABAAA”, lps[] is [0, 1, 2, 0, 1, 2, 3]
	For the pattern “AAACAAAAAC”, lps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]


	------------------------------------------------------------------------------------------------------
	#define MAX_N 100010
	char T[MAX_N], P[MAX_N]; // T = text, P = pattern
	int b[MAX_N], n, m; // b = back table, n = length of T, m = length of P
	void kmpPreprocess()
	{
	// call this before calling kmpSearch()
	int i = 0, j = -1; b[0] = -1; // starting values
	while (i < m)
	{ // pre-process the pattern string P
	while (j >= 0 && P[i] != P[j])
	j = b[j]; // if different, reset j using b
	i++;j++; // if same, advance both pointers
	b[i] = j;

	}
	}
	void kmpSearch()
	{ // this is similar as kmpPreprocess(), but on string T
	int i = 0, j = 0; // starting values
	while (i < n)
	{ // search through string T
	while (j >= 0 && T[i] != P[j])
	j = b[j]; // if different, reset j using b
	i++; j++; // if same, advance both pointers
	if (j == m)
	{ // a match found when j == m
	printf("P is found at index %d in T\n", i - j);
	j = b[j]; // prepare j for the next possible match
	}
	}
	}