'Coding has over 700 languages', '67% of programming jobs aren’t in the technology industry', 'Coding is behind almost everything that is powered by electricity', 'Knowing how to code is a major requirement for astronomers', 'The first computer didn’t use any electricity', 'Do you know there is a coding language named “Go“', 'Computer programming is one of the fastest-growing careers', 'Fortran (FORmula TRANslation) was the name of the first programming language', 'The first programmer was the daughter of a mad poet', 'Many programming languages share the same structure', 'Coding will soon be as important as reading', 'How many programmers does it take to change a light bulb? None, that’s a hardware problem', 'Why do Java developers wear glasses? Because they can’t C', 'Software and temples are much the same — first we build them, then we pray', 'An engineer will not call it a bug — it’s an undocumented feature', 'In a room full of top software designers, if two agree on the same thing, that’s a majority', 'C programmers never die. They are just cast into void', 'Knock, knock … Who’s there? … *very long pause* … Java', 'The best thing about a boolean is even if you are wrong, you are only off by a bit', 'Linux is only free if your time has no value', 'The computer was born to solve problems that did not exist before', 'Coding has over 700 languages', '67% of programming jobs aren’t in the technology industry', 'Coding is behind almost everything that is powered by electricity', 'Knowing how to code is a major requirement for astronomers', 'The first computer didn’t use any electricity', 'Do you know there is a coding language named “Go“', 'Computer programming is one of the fastest-growing careers', 'Fortran (FORmula TRANslation) was the name of the first programming language', 'The first programmer was the daughter of a mad poet', 'Many programming languages share the same structure', 'Coding will soon be as important as reading', 'How many programmers does it take to change a light bulb? None, that’s a hardware problem', 'Why do Java developers wear glasses? Because they can’t C', 'Software and temples are much the same — first we build them, then we pray', 'An engineer will not call it a bug — it’s an undocumented feature', 'In a room full of top software designers, if two agree on the same thing, that’s a majority', 'C programmers never die. They are just cast into void', 'Knock, knock … Who’s there? … *very long pause* … Java', 'The best thing about a boolean is even if you are wrong, you are only off by a bit', 'Linux is only free if your time has no value', 'The computer was born to solve problems that did not exist before',
Update appNew update is available. Click here to update.
Last Updated: Jul 3, 2023

Prefix function - Knuth-Morris-Pratt Algorithm

Author Malay Gain
2 upvotes
gp-icon
Competitive programming
Free guided path
16 chapters
99+ problems
gp-badge
Earn badges and level up

Introduction

KMP is the most popular substring search algorithm in linear time. This algorithm is used for finding linear time solutions of various string-related problems such as searching substring in a string, the number of unique substrings in a string, pattern matching problem, etc.

Algorithm

The main component of the KMP algorithm is the prefix function. It is defined as for a string s of length n, π[i] is the length of the longest proper prefix of the substring s[0…i], which is also a suffix of this substring where the proper prefix of a string is a prefix that is not equal to the string itself. π[i] is the ith element of an n-length array π. By definition, π[0]=0.

 

π[i]= max

        k=0...i

 

k : s[0…k−1]=s[i−(k−1)…i],

where s[0…k−1] is the proper prefix and s[i−(k−1)…i] is the suffix of length k of the string s.

 

e.g  for string "ababaca", prefix function array π is [0,0,1,2,3,0,1]

 

source

 

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Brute force approach

vector<int> prefix_function(string s) 
{
    int n = (int)s.length();
    vector<int> pi(n);
    for (int i = 0; i < n; i++)
        for (int k = 0; k <= i; k++)
        {
         if (s.substr(0, k) == s.substr(i-k+1, k))
            {
             pi[i] = k;
            }
        }
            
                
    return pi;
}

 

It takes O(n3) in the worst case.

Optimized approach:

  • At first, compute the prefix values π[i] in a loop by iterating from i=1 to i=n−1 (π[0] is assigned with 0).
  • To calculate the current value π[i], set the variable j denoting the length of the best suffix for i−1. Initially, j=π[i−1].
  • Test if the suffix of substring s[0....i] of length j+1 is also a prefix of the substring s[0....i] by comparing s[j] and s[i]. If they are equal, we assign π[i]=j+1 (the values of the prefix function can only increase by at most one); otherwise, reduce j to π[j−1] ( greatest k<j, for which the prefix property holds is π[j−1]) and repeat this step.
  • If length j=0 is reached and still don't have a match, then assign π[i]=0 and go to the next index i+1

 

Code

vector<int> prefix_function(string s) 
{
    int n = (int)s.length();
    vector<int> pi(n);
    for (int i = 1; i < n; i++) 
  {
        int j = pi[i-1];
        while (j > 0 && s[i] != s[j])
            j = pi[j-1];
        if (s[i] == s[j])
            j++;
        pi[i] = j;
    }
    return pi;
}

 

Complexity analysis

The KMP algorithm takes O(n) time in the worst case and space complexity is also O(n) for using an extra array of length n.

Also check out - Substr C++, and Euclid GCD Algorithm

Application

Searching a substring in a string

Searching a pattern P in string S using the ∏ function is just like making the ∏ function. If we get any difference in S and P, we can jump on a character in S with the help of ∏ function because we already equaled the suffix and prefix. Go through the below example for understanding how prefix function is used for pattern matching.

 

source

 

Code

#include<iostream>
using namespace std;


void findPrefix(string P, int m, int prefArray[]) {
  int length = 0;
  prefArray[0] = 0;     //first place is always 0 as no prefix

  for(int i = 1; i<m; i++) {
      if(P[i] == P[length]) {
        length++;
        prefArray[i] = length;    
      }else {
        if(length != 0) {
            length = prefArray[length - 1];
            i--;     //decrease i to avoid effect of increasing after iteration
        }else
            prefArray[i] = 0;
      }
  }
}

void KMPSearch(string S, string P, int *locArray, int &loc) {
  int n, m, i = 0, j = 0;
  n = S.size();
  m = P.size();
  int prefixArray[m];    //prefix(?) array as same size of P 
  findPrefix(P, m, prefixArray);
  loc = 0;

  while(i < n) {
      if(S[i] == P[j]) {
        i++; 
        j++;
      }

      if(j == m) {
        locArray[loc] = i-j;      //item found at i-j position.
        loc++;
        j = prefixArray[j-1];    //get the prefix length from array
      }else if(i < n && P[j] != S[i]) {
        if(j != 0)
            j = prefixArray[j-1];
        else
            i++;
      }
  }
}




int main() {
  string str = "aaaabaaaaabbbaaaab";
  string patt = "aaab";
  int locationArray[str.size()]; // array to store the start locations from where the pattern matched
  int index;
  KMPSearch(str, patt, locationArray, index);

  for(int i = 0; i<index; i++) {
      cout << "Pattern found at location: " <<locationArray[i] << endl;
  }
}

 

Output

Pattern found at location: 1
Pattern found at location: 7
Pattern found at location: 14

 

FAQs

  1. What is the application of the KMP algorithm?
    KMP algorithm is used for searching a substring in a string, the number of unique substrings in a string, pattern matching problem.
     
  2. What is the alternative algorithm of the Knuth-Morris-Pratt algorithm?
    Robin-Karp algorithm is an alternative to the KMP algorithm.
     
  3. What is the proper prefix of a string?
    The proper prefix of a string is a substring that is not equal to the length of the string itself.

 

Key Takeaways

This article covered the KMP algorithm. We have learned about the algorithm to apply it for solving various string-related problems.

Check out this problem - Longest Common Prefix

Now, you can practice and solve a wide range of string-related coding questions commonly asked in interviews in Coding Ninjas Studio. Along with coding questions, we can also find the interview experience of scholars working in renowned product-based companies here. 

Happy learning!

Previous article
Maximize the Confusion of an Exam
Next article
Lexicographically largest String after removal of K characters
Guided path
Free
gridgp-icon
Competitive programming
16 chapters
217+ Problems
gp-badge
Earn badges and level up
Live masterclass