Update appNew update is available. Click here to update.
Last Updated: Oct 18, 2023
Medium

Longest Common Substring

Author Nikunj Goel
4 upvotes

Introduction

This blog will discuss the various approaches to solving the Longest Common Substring Problem. Before jumping into the solution, let’s first recall what a substring is and then understand the Longest Common Substring problem.

A substring is a contiguous part of the string. Its length will be less than or equal to the original string length. Let’s get into the article now.

Longest Common Substring

Problem Statement

In this problem, we will be given two strings. We have to find the length of the longest substring, which is common in the two given strings. 

Example

Assume the two given strings are:

s1 = “qdef”

s2 = “defq”

“q”, “d”,  “e”, “f”, “de”, “ef”, and “def” are the common substrings between s1 and s2. “def” is the longest common substring. So, the length of the longest common substring of s1 and s2 will be “3”.

1. Simple Approach

The simplest approach is to generate all possible substrings of one string. Then check if each of them is a substring of the other string. This approach has a time complexity of O(n^3). n is the length of the longest input string. It is not efficient for large input sizes. It is mainly used for small instances or as a baseline for comparison.

C++ Code of Simple Approach

#include <iostream>
#include <string.h>
using namespace std;

//simple approach
int LCSubstr(string str1 ,string str2){
	int result = 0; 
     
	/* 
		loop to find all substrings of string 1.
		check if current substring matches.
		maximize the lenght of common substring 
	*/
	for (int i = 0; i < str1.length(); i++) {
		for (int j = 0; j < str2.length(); j++) {
			int k = 0;
				while ((i + k) < str1.length() && 
					(j + k) < str2.length() && str1[i + k] == str2[j + k]){
					k = k + 1;
				}
				
			result = max(result, k);
		}
	}
	return result;
}
 
// Driver code
int main(){
	string X = "qdef";
	string Y = "defq";
 
	cout << "Length of Longest Common Substring is " << LCSubstr(X, Y);
	return 0;
}


Output

C++ code of simple approach

Java Code of Simple Approach

// This is the Simple Approach.
class Main{
	static String str1, str2; 
	public static int lcsubstr(String str1, String str2) {
		int result = 0;
		for(int i = 0; i < str1.length(); i++) {
			for (int j = 0; j < str2.length(); j++) {
				int k = 0;
				while ((i + k) < str1.length() && 
					(j + k) < str2.length() && str1.charAt(i + k) == str2.charAt(j + k)) {
					k = k + 1;
				}
				
				result = Math.max(result, k);
			}
		}
		return result;
	}
    
	// Driver code
	public static void main(String[] args) {
		str1 = "qdef";
		str2 = "defq";
       	
		System.out.println("Length of Longest Common Substring is "+ lcsubstr(str1, str2));
	}
}


Output

java code of simple approach

Python Code of Simple Approach

def lcs(str1, str2):
	result = 0
	for i in range(len(str1)):
		for j in range(len(str2)):
			k = 0
			while ((i + k) < len(str1) and (j + k) < len(str2)
				and str1[i + k] == str2[j + k]):
					k = k + 1
			result = max(result, k)
	return result

def main():
	s1 = "qdef"
	s2 = "defq"

	print("The Length of Longest Common Substring is", lcs(s1, s2))

if __name__ == '__main__':
	main()


Output

Python code of simple approach

2. Recursive Approach

This problem can be solved using Recursion. We can consider one by one each character in the two strings. The following two cases will arise:

1. The two characters from the two strings are the same: 

In this case, increase the length of the longest common substring and move to the next characters in both strings.

2. The two characters from the two strings are different:

In this case, take the maximum of the current length of the longest common string and the two lengths of the longest common substring from two cases - one by moving to the next character in only the first string and another by moving to the next character in only the second string.

Let’s understand this algorithm in detail and its C++ code.

Algorithm

Step 1. Create a recursive function “longestCommonSubstringLength()” which takes input

  1. s1 - First given string
  2. s2 - Second given string
  3. i - It denotes the current length of s1
  4. j - It denotes the current length of s2
  5. maxLen - For storing the length of the longest common substring
     

Step 2. Write the base case of the recursive function. If i <= 0 or j <= 0 i.e. if the current length of any of the string becomes 0, then simply return ‘maxLen’.

Step 3.  Now check if the characters at index ‘i-1’ in the first string and ‘j-1’ in the second string are the same, then increase the value of ‘maxLen’ by one and call the function for the smaller sub-problem.

Step 4.  Else If the two characters are not the same, then update the value as the maximum of ‘maxLen’ and the value returned by calling the recursive function for two smaller sub-problems.

Step 5. Finally, return ‘maxLen,’ i.e., the length of the longest common substring.

Implementation in C++

#include <bits/stdc++.h>
using namespace std;

int longestCommonSubstringLength(string s1, string s2, int i, int j, int maxLen){
	// Base Case
	if (i <= 0 || j <= 0) {
		return maxLen;
	}

	int maxLen1 = maxLen;

	// If the characters are same in both the string then increase maxLen by 1
	if (s1[i - 1] == s2[j - 1]) {
		maxLen1 = longestCommonSubstringLength(s1, s2, i - 1, j - 1, maxLen + 1);
	}

	int maxLen2 = longestCommonSubstringLength(s1, s2, i, j - 1, 0);
	int maxLen3 = longestCommonSubstringLength(s1, s2, i - 1, j, 0);

	return max(maxLen1, max(maxLen2, maxLen3));
}

int main(){
	string s1 = "qdef";
	string s2 = "defq";
    
	int maxLen=0;
    
	cout<<"The Length of Longest Common Substring is "<<
	longestCommonSubstringLength(s1, s2, s1.length(), s2.length(), maxLen)<<endl;

	return 0;
}


Output

Longest Common Substring in C++

Implementation in Java

// This is the recursive Approach
class Main {
	static String string_1, string_2;
   
	static int LCSRecursive(int i, int j, int count) {
		// Base Case
		if (i == 0 || j == 0) {
			return count;
		}
       
		/* 
			If the last character is same then the 
			next character need to be compared
			increase the count by 1 
		*/
			
		if (string_1.charAt(i - 1) == string_2.charAt(j - 1)) {
			count = LCSRecursive(i - 1, j - 1, count + 1);  
		}
       
		/*
			if the characters are not the same then you need to 
			decrease the i once and update the count to 0
			In the other case we decrement the j and update the count to 0
			Keep track of maximum value. 
		*/
		
		count = Math.max(count, Math.max(LCSRecursive(i, j - 1, 0), LCSRecursive(i - 1, j, 0)));
		return count;
	}
    
	// Driver code
	public static void main(String[] args) {
       
		string_1 = "qdef";
		string_2 = "defq";
       
		System.out.println("The Length of Longest Common Substring is "+ LCSRecursive(string_1.length(), string_2.length(), 0));
	}
}


Output

Longest Common Substring Java

Implementation in Python

def lcsrecursive(i, j, count, str1, str2):
	if i == 0 or j == 0:
		return count
 
	if str1[i - 1] == str2[j - 1]:
		count = lcsrecursive(i - 1, j - 1, count + 1, str1, str2);

	count1 = lcsrecursive(i, j - 1, 0 , str1, str2)

	count2 = lcsrecursive(i - 1, j, 0, str1, str2)

	count = max({count, count1,count2})
	return count


def main():
	s1 = "qdef"
	s2 = "defq"

	print("The Length of Longest Common Substring is", lcsrecursive(len(s1),len(s2),0, s1, s2))


if __name__ == '__main__':
	main()


Output

Longest Common Substring Python


Time complexity: O(3^(M+N))

As the function is called 3 times, and it checks for all the characters, i.e. M + N, 

where M and N are the lengths of the respective strings. A recursive tree is created.

Space complexity: O(M+N)

The recursive function takes a space of N+M for auxiliary stack space during recursion.

3. Dynamic Programming Approach

In the recursive approach, there are a lot of duplicate calls for the same state i.e. for the same values of i, j, and maxLen, which leads to exponential time complexity.

Dynamic Programming Approach

The above solution can be modified, and the results can be stored in a table to avoid the repetitive calling of recursive functions for the same state. Thus dynamic programming approach will significantly reduce the time complexity as compared to the recursive solution.

The algorithm will be similar to the above recursive approach. The only difference is that instead of repetitively calling the recursive function, we will store the length of the longest common substring for each state in a table.

Algorithm

  • Create a matrix dp of size (m+1) x (n+1).
  • Initialize the first row and the first column of the matrix dp with zeros.
  • Initialize two variables: maxLength to store the length of the longest common substring and endIndex to store the ending index of the longest common substring.
  • Start a nested loop.
    • If i == 0 or j == 0, then dp[i][j] = 0.
    • If str1[i] == str2[j], then dp[i][j] = 1 + dp[i – 1][j – 1].
  • Keep track of the maximum value and return it.

C++ Implementation

#include <iostream>
#include <string.h>
using namespace std;

int lcsDP(string string_1, string string_2, int M, int N) {
	// create a 2-D array named dp and intitialise all the values as zero.

	int dp[M + 1][N + 1];

	int count = 0; 
	/* 
		intitialise a nested loop
		Check the base Case 
	*/
	for (int i = 0; i <= M; i++) {
		for (int j = 0; j <= N; j++) {
			if (i == 0 || j == 0)
				dp[i][j] = 0; 
			else if (string_1[i - 1] == string_2[j - 1]) {
				dp[i][j] = dp[i - 1][j - 1] + 1;
				count = max(count, dp[i][j]);
			}
			else	
				dp[i][j] = 0;
		}
	}
	return count;
}

// Driver code
int main() {
	string str1 = "qdef";
	string str2 = "defq";

	cout << "Length of Longest Common Substring is " << lcsDP(str1, str2, str1.length(),str2.length());
	return 0;
}


Output

Dynamic Programming Approach in C++

Java Implementation

class Main {
   
	// function which returns the length of longest common substring 
	static int lcsDP(String string_1, String string_2, int M, int N) {
      
		// create a 2-D array named dp and intitialise all the values as zero.
		int dp[][] = new int[M + 1][N + 1];
      
		int count = 0;

		/*  
			fill the values in the dp array in bottom-up manner
			initialize a nested loop
			if the characters at string_1 and string_2 matches 
			keep track of maximun element
			if the characters did not match then the value is zero  
		*/
        
		for (int i = 0; i <= M; i++) {
           
			for (int j = 0; j <= N; j++) {
				if (i == 0 || j == 0) {
					dp[i][j] = 0;
				}
				else if (string_1.charAt(i - 1) == string_2.charAt(j - 1)) {
                   dp[i][j] = dp[i - 1][j - 1] + 1;
					count = Integer.max(count, dp[i][j]);
				}
				else { 
					dp[i][j] = 0;
				}
			}
           
		}
		return count;
	}

	// Driver code
	public static void main(String[] args) {
       
		String str1 = "qdef";
		String str2 = "defq";
       
		System.out.println("The Length of the Longest Common Substring is " + lcsDP(str1, str2, str1.length(),str2.length()));
	}
}


Output

Dynamic Programming Approach in java

Python Implementation

def lcsDP(string_1, string_2, M, N):
   
	# create a 2-D array named dp and intitialise all the values as zero.
	dp = [[0 for k in range(N+1)] for l in range(M+1)]

	# To store the length of the longest common substring
	count = 0

	# initialize a nested loop
	# if the characters at string_1 and string_2 matches
	# keep track of maximun element
	# if the characters did not match then the value is zero
	
	for i in range(M + 1):
		for j in range(N + 1):
			if (i == 0 or j == 0):
				dp[i][j] = 0
                
			elif (string_1[i-1] == string_2[j-1]):
				dp[i][j] = dp[i-1][j-1] + 1			
				count = max(count, dp[i][j])
                
			else:
				dp[i][j] = 0
   
	return count
   
# Driver code
if __name__ == "__main__":
   
	string_1 = "qdef"
	string_2 = "defq"
	print("The Length of Longest Common Substring is", lcsDP(string_1, string_2, len(string_1), len(string_2)))


Output

Dynamic Programming Approach in python

Time Complexity: O(M*N) 

where M and N are the lengths of the given string1 and string2. The nested loop for filling the entries of the 2-D dp table requires a time on O(M*N).

Space Complexity: O(M*N) 

where M and N are the lengths of the given string1 and string2. 2-D dp array of size [M+1][N+1] is required.

Frequently asked questions

What is the complexity of finding longest common substring?

Finding the longest common substring can be done in O(m*n) time using dynamic programming. For each substring of both strings, determine the length of the longest common suffix and record that length in a table. Use the previously computed values to determine the answer of larger suffixes in O(1) time.

What is the naive approach to the longest common substring?

The naive approach for finding the longest common substring between two strings involves comparing all possible substrings of both strings. It has a time complexity of O(m^2 * n), making it impractical for large inputs.

What is the real life application of the longest common subsequence?

The longest common subsequence (LCS) is used in plagiarism detection, version control systems, bioinformatics for DNA sequence alignment, data comparison, speech recognition, spell checkers, music analysis, image processing, file comparison, and network routing to identify common patterns in data sequences.

Conclusion

In this article, we discussed the Longest Common Substring Problem, the various approaches to solve this problem programmatically, and the time and space complexities. We find out how to use extra space and drastically reduce the time complexity from exponential to polynomial. If you want to practice this problem, then you can visit Coding Ninjas Studio.
Recommended problems -

If you think this blog helped you share it with your friends! Refer to the DSA C++ course for more information.

Until then, All the best for your future endeavors, and Keep Coding.

Previous article
Longest Common Subsequence
Next article
Longest Repeating Subsequence