Longest common substring

Riya
Last Updated: May 25, 2022

This blog will discuss the various approaches to solve the Longest Common Substring Problem. Before jumping into the solution, let’s first recall what a substring is and then understand the Longest Common Substring problem.

A substring is a contiguous part of the string. Its length will be less than or equal to the original string length.

In this problem, we will be given two strings. We have to find the length of the longest substring, which is common in the two given strings. 

Let’s take an example to understand it further:

Assume the two given strings are:

s1 = “qdef”

s2 = “defq”

“q”, “d”,  “e”, “f”, “de”, “ef”, “def” are the common substrings between s1 and s2. “def” is the longest common substring. So, the length of the longest common substring of s1 and s2 will be “3”.

Brute Force Approach -

The brute force solution to this problem is to consider all possible substrings of the first string and check whether it is also a substring of the second string and keep updating the length of the longest common substring. 

A simple way to implement this is to fix the starting points in the two strings and check the maximum length of the substring generated for each possible pair of starting indices fixed in the two strings. Meanwhile, keep updating the length of the longest common substring.

 

Let’s understand this by considering the above example:

s1 = “qdef”

s2 = “defq”

Starting index of the first stringStarting index of the second stringMaximum length of the common substring if the starting point of the substring is fixed in both the strings
000
010
020
031
103
110
120
130
200
211
220
230
300
310
320
331

 

In the above table, we can see that the length of the longest common substring is 3, which will be generated when starting indices are fixed to ‘1’ and ‘0’ in s1 and s2, respectively. The generated longest common substring is “def”. 

 

Algorithm -

Step 1. Create a function ‘longestCommonSubstringLength()’ that will accept two parameters - s1 and s2, the two given strings. Initialize a “maxLen” variable to '0', that will store the length of the longest common substring.

Step 2. Initialize two variables ‘n1’ and ‘n2’ with the length of the two strings ‘s1’ and ‘s2’, respectively.

Step 3. Run two nested ‘for loops’ for fixing the starting indices in the two strings

Step 4. Run a ‘while loop’ inside the two ‘for loops’ to check the length of the possible longest substring with the two fixed starting indices in the two strings.

Step 5.  After checking the length of the possible longest substring with the two fixed starting indices in the two strings, update the “maxLen” variable. After checking it for all possible pairs of starting indices, return the length of the longest common substring, i.e., “maxLen”.

 

C++ code:

 

#include <bits/stdc++.h>
using namespace std;
int longestCommonSubstringLength(string s1, string s2)
  {
   int n1 = s1.length();
   int n2 = s2.length();
   int maxLen=0;
   for(int i = 0; i < n1; i++)
     {
      for(int j = 0; j < n2; j++)
        {


                     /*
                         If the two corresponding characters from the 
                         two strings are same, then check the length upto 
                         which the Corresponding characters are same in the
                         two strings
                     */
         if(s2[j]==s1[i])
           {
            int len=0;
            int curr1 = i, curr2 = j;


                           /*
                               Checking the maximum length upto which 
                               corresponding characters are same
                           */
            while(curr1 < n1 && curr2 < n2 && s1[curr1] == s2[curr2])
              {
                 len++;
                 curr1++;
                 curr2++;
              }


                          // Updating the length of the longest common substring
            maxLen = max(maxLen, len);
           }
        }
     }
   return maxLen;
  }


int main()
  {
      string s1 = "qdef";
      string s2 = "defq";
      cout<<"The length of the longest common substring of s1 and s2 is "<<longestCommonSubstringLength(s1,s2)<<endl;
  }

 

Output:
The length of the longest common substring of s1 and s2 is 3

 

Algorithm Complexity: 

Time Complexity: O(N * M * max(N, M))

We have created two nested “for loops”. The first and second “for loop” are run to consider all possible indices in the first and second, respectively. Then a for loop is created for checking the length of the longest common substring. So, the time complexity is O(N * M * max(N, M)) where ‘N’ and ‘M’ are the lengths of the first and second strings, respectively.

 

Space Complexity: O(1) 

As we are using constant space, therefore the overall space complexity will be O(1).

 

Recursive Solution

This problem can be solved using recursion. We can consider one by one each character in the two strings. The following two cases will arise:

  1. The two characters from the two strings are the same: 

In this case, increase the length of the longest common substring and move to the next characters in both the string.

 2. The two characters from the two strings are different:

In this case, take the maximum of the current length of the longest common string and the two lengths of the longest common substring from two cases - one by moving to the next character in only the first string and another by moving to the next character in only the second string.

Let’s understand this algorithm in detail and its C++ code.

 

Algorithm

Step 1. Create a recursive function “longestCommonSubstringLength()” which takes input

  1. s1 - First given string
  2. s2 - Second given string
  3. i - It denotes the current length of s1
  4. j - It denotes the current length of s2
  5. maxLen - For storing the length of the longest common substring


Step 2. Write the base case of the recursive function. If i <= 0 or j <= 0 i.e. if the current length of any of the string becomes 0, then simply return ‘maxLen’.

Step 3.  Now check if the characters at index ‘i-1’ in the first string and ‘j-1’ in the second string are the same, then increase the value of ‘maxLen’ by one and call the function for the smaller sub-problem.

Step 4.  Else If the two characters are not the same, then update the value as the maximum of ‘maxLen’ and the value returned by calling the recursive function for two smaller sub-problems.

Step 5. Finally, return ‘maxLen,’ i.e., the length of the longest common substring.

 

C++ Code

#include <bits/stdc++.h>
using namespace std;


int longestCommonSubstringLength(string s1, string s2, int i, int j, int maxLen)
  {


       // Base Case
   if (i <= 0 || j <= 0) {
   return maxLen;
   }


int maxLen1 = maxLen;


       // If the corresponding characters are same in both the string then increase maxLen by 1
if (s1[i - 1] == s2[j - 1]) {
maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1);
}


int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0);
int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);


return max(maxLen1, max(maxLen2, maxLen3));
 }


int main()
{
    string s1 = "qdef";
    string s2 = "defq";
    
    int n1 = s1.length();
    int n2 = s2.length();
    
    int maxLen=0;
    
    cout<<"The length of the longest common substring of s1 and s2 is "<<
    longestCommonSubstringLength(s1, s2, n1, n2, maxLen)<<endl;


return 0;
}

 

Output:
The length of the longest common substring of s1 and s2 is 3

 

Algorithm Complexity

Time Complexity: O(2 ^ max(N, M))

In the worst case, i.e., none of the elements in the two strings are the same, two recursive function calls will be made for each character in the strings. So, the time complexity is O( 2 ^ max(N, M)), where ‘N’ and ‘M’ are the lengths of the two given strings.

Space Complexity: O(1) 

As we are using constant space, therefore the overall space complexity will be O(1).

Efficient Approach Using Dynamic Programming

In the recursive approach, there are a lot of duplicate calls for the same state i.e. for same values of i, j and maxLen, which leads to exponential time complexity.

 

The above solution can be modified, and the results can be stored in a table to avoid the repetitive calling of recursive functions for the same state. Thus dynamic programming approach will significantly reduce the time complexity as compared to the recursive solution.

The algorithm will be similar to the above recursive approach. The only difference is that instead of repetitively calling the recursive function, we will store the length of the longest common substring for each state in a table.

Top-Down Approach

This approach will be similar to the recursive approach but the difference is that we will store the calculated result for a particular value of i, j and maxLen in a matrix. And, when we need result for that values of i, j and maxLen, we will use the stored value instead of calling the function again.

Algorithm

Step 1. Create a function “longestCommonSubstringLength()” which will take parameters - the two given strings and their lengths.

Step 2. Initialize a variable “maxLen” inside main function to store the length of the longest common substring and a 3-dimansional array “dp” globally to store the results for all possible combinations of i, j and maxLen.

Step 3. Write the base case similar to the above recursive function. Now, check the dp table, if we have already found the result for the current value of i, j and maxLen then return it.

Step 4. Write the other conditions similar to the above recursive function, jut make one change that after getting the result for the current state, first save it into the “dp” table then return the obtained result.

 

C++ Code

#include <bits/stdc++.h>
using namespace std;


int dp[100][100][100];


int longestCommonSubstringLength(string s1, string s2, int i, int j, int maxLen)
  {
   if (i <= 0 || j <= 0) {
   return maxLen;
   }

if (dp[i][j][maxLen] != -1) {
return dp[i][j][maxLen];
}

int maxLen1 = maxLen;
if (s1[i - 1] == s2[j - 1]) {
maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1);
}


int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0);
int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);


return dp[i][j][maxLen] = max(maxLen1, max(maxLen2, maxLen3));
 }


int main()
{
    string s1 = "qdef";
    string s2 = "defq";
    
    int n1 = s1.length();
    int n2 = s2.length();
    
    int maxLen=0;
    
    memset(dp, -1, sizeof(dp));
    cout<<"The length of the longest common substring of s1 and s2 is "<<
    longestCommonSubstringLength(s1, s2, n1, n2, maxLen)<<endl;


return 0;
}

 

Output:
The length of the longest common substring of s1 and s2 is 3

 

Algorithm Complexity

Time Complexity: O(N * M* max(M,N))

As we are storing the values in a 3-dimensional functional, we will be making at most N*M*max(N,M) calls to the function and each call will take O(1) time. So, the time complexity is O(N*M*(max(N,M))


Space Complexity: O(N*M*max(N,M)) 

For storing the answer of each state, we have created a table of size ‘N*M*max(N,M)’,  so the space complexity is O(N*M*max(N,M)).

Bottom-Up Approach

In this approach, we will totally eliminate the recursive function calls. We will simply create a table and start building solution from the smallest possible state and keep storing the results.

Algorithm

Step 1. Create a function “longestCommonSubstringLength()” which will take parameters - the two given strings and their lengths.

Step 2. Initialize a variable “maxLen” to store the length of the longest common substring. Initialize a ‘dp’ table to store the answer of each state created by fixing indices in two strings.

By fixing the indices in the two strings, we are storing the length of the longest common substring, which will end at these two indices in the corresponding strings.

dp[i][j] = length of the longest common substring, which will end at ith and jth index in the first and second string, respectively.


Step 3. Create two ‘for loops’ for iterating through the indices in the two strings and considering the corresponding characters. If the two characters are the same, add one to the answer of the previous state and update the value of “maxLen”. Else, store ‘0’ as the answer to the current state.

Step 4. After iterating through all the indices in the two strings, return the length of the longest common substring i.e. “maxLen”.

 

C++ Code

#include <bits/stdc++.h>
using namespace std;


int longestCommonSubstringLength(string s1, string s2, int n1, int n2)
  {
   int maxLen = 0;
   int dp[n1 + 1][n2 + 1];
    
   for(int i = 0; i <= n1; i++)
     {
      for(int j = 0; j <= n2; j++)
        {
         if(i == 0 || j == 0) dp[i][j]=0;
          
         else if(s1[i - 1] == s2[j - 1])
           {
            dp[i][j] = 1 + dp[i-1][j-1];
            maxLen = max(maxLen, dp[i][j]);
           }
         else {
           dp[i][j]=0;
         }
        }
     }
      
   return maxLen;
  }


int main()
{
    string s1 = "qdef";
    string s2 = "defq";
    
    int n1 = s1.length();
    int n2 = s2.length();
    
    int maxLen=0;
    cout<<"The length of the longest common substring of s1 and s2 is "<<
    longestCommonSubstringLength(s1, s2, n1, n2)<<endl;
    
    return 0;
}

 

Output:
The length of the longest common substring of s1 and s2 is 3

 

Algorithm Complexity

Time Complexity: O(N * M)

Here, two nested loops are created - one for iterating through each index of the first string and the other for iterating through each index of the second string. So, the time complexity is O(N * M)

Space Complexity: O(N * M) 

For storing the answer of each state, we have created a table of size ‘N * M’, so the space complexity is O(N * M).

 

To understand it better and know proper code implementation, you should watch this video.

Frequently asked questions-

 

1) What is the Longest Common Substring Problem?

  The longest common substring(LCS) is to calculate the length of the longest substring, which is common in two given strings.

 

2) How Longest Common Substring problem is different from Longest Common Subsequence Problem?

The substring of a string is a continuous part of the string, but the subsequence of the string is not necessarily continuous. And this is the difference between the two problems - Longest Common Substring and Longest Common Subsequence.

 

Key takeaways-

In this article, we discussed what the Longest Common Substring Problem is, the various approaches to solve this problem programmatically, and the time and space complexities. We find out how to use extra space and drastically reduce the time complexity from exponential to polynomial.If you want to practice this problem, then you can visit codestudio.


If you think that this blog helped you, then share it with your friends!. Refer to the DSA C++ course for more information.

 

Until then, All the best for your future endeavors, and Keep Coding.

Was this article helpful ?
0 upvotes