Longest Common Subsequence

Longest Common Subsequence
Longest Common Subsequence

Introduction

The longest common subsequence (LCS) is defined as the longest subsequence that is common to all the given sequences, provided that the elements of the subsequence are not required to occupy consecutive positions within the original sequences.

Note: Subsequence is a part of the string which can be made by omitting none or some of the characters from that string while maintaining the order of the characters.

For Example “ABC”, “ AGE”, “BEC”,…etc are subsequences of “ABCDEFG”.

illustravitve_example
Example

In the above example, there are two strings “acbaed” and “abcadf.” Here the longest common subsequence is “acad” with length 4. Let’s explore some methods and techniques to find the longest subsequence.

Finding LCS using Recursion:

Let’s first think of a brute-force approach using recursion. For LCS, we have to match the starting characters of both strings. If they match, then simply we can break the problem as shown below:

s1 = “x|yzar
s2 = “x|qwea

The rest of the LCS will be handled by recursion. But, if the first characters do not match, then we have to figure out that by traversing which of the following strings, we will get our answer. This can’t be directly predicted by just looking at them, so we will be traversing over both of them one by one and check for the maximum value of LCS obtained among them to be considered for our answer.

For example:

Suppose string s = “xyz” and string t = “zxay.”

We  can see that their first characters do not match, so that we can call recursion over it in either of the following ways:

A =         

blog banner 1

B = 

C =

Finally, our answer will be LCS = max(A, B, C). Let’s analyze the code for better understanding.

Code:

// Recursive implementation of
// Longest Common subsequence problem

#include <bits/stdc++.h>
using namespace std;

int N, M;

// For calculating the length of the longest common subsequence

int lcs(string S, string T, int i, int j)
{
    // one or both of the strings are fully traversed

    if (i == N || j == M)
        return 0;

    // check if current characters are equal

    if (S[i] == T[j])
        // check for next character
        return (lcs(S, T, i + 1, j + 1) + 1);
    else
        // check which call has maximum value and add to length
        return max(lcs(S, T, i + 1, j), lcs(S, T, i, j + 1));
}
int main()
{
    string S = " abcde";
    string T = "acek";

    N = S.length();
    M = T.length();
    cout << "Length of longest common subsequence " << lcs(S, T, 0, 0);

    return 0;
}

Output:

Length of longest common subsequence 3

If we dry run this over the example: s = “xyz” and t = “zxay,” it will look something like below:

illustrative_diagram

Here, as for each node, we will be making three recursive calls, so the time complexity will be exponential and is represented as O(2m+n), where m and n are the lengths of both strings. If we carefully observe the above code, we can skip the third recursive call as the two will cover it.

Now, thinking, over-improving this time complexity…

Consider the diagram below, where we are representing the dry run in terms of its length taken at each recursive call:

illustrative_diagram

As we can see, there are multiple overlapping recursive calls, the solution can be optimized using memoization followed by DP. 

Finding LCS using Memoization:

To control the time complexity of the recursion approach, we are using the memoization method. So, beginning with the memorization approach, as we want to match all the subsequences of the given two strings, we have to figure out the number of unique recursive calls.

For string s, we can make at most length(s) recursive calls, and similarly, for string t, we can make at most length(t) recursive calls, which are also dependent on each other’s solution.

Hence, our result can be directly stored in the form of a 2-dimensional array of size (length(s)+1) * (length(t) + 1) as for string s, we have 0 to length(s) possible combinations, and the same goes for string t.

So for every index ‘i’ in string s and ‘j’ in string t, we will choose one of the following two options:

  1. If the character s[i] matches t[j], the length of the common subsequence would be one plus the length of the common subsequence till the i-1 and j-1 indexes in the two respective strings.
  2. If the character s[i] does not match t[j], we will take the longest subsequence by either skipping i-th or j-th character from the respective strings.

Hence, the answer stored in the matrix will be the LCS of both strings when the length of string s is ‘i’, and the length of string t will be ‘j.’ Hence, we will get the final answer at the position matrix[length(s)][length(t)]. Moving to the code:

Code:

// Recursive implementation of
// Longest Common subsequence problem
// using memoization

#include <bits/stdc++.h>
using namespace std;

const int max_value = 1000;

int N, M;

// For calculating the length of the longest common subsequence

int lcs(string S, string T, int i, int j, int memo[][max_value])
{
    // one or both of the strings are fully traversed

    if (i == N || j == M)
        return 0;

    // if the result for the current pair is already present in
    // the table

    if (memo[i][j] != -1)
        return memo[i][j];
 // check if the current characters in both the strings are equal

    if (S[i] == T[j])
        // check for the next characters in both the strings and store the
        // calculated value in the memo table

        memo[i][j] = lcs(S, T, i + 1, j + 1, memo) + 1;
    else

        // check which call has the maximum value and store it in
        // the memo table

        memo[i][j] = max(lcs(S, T, i + 1, j, memo), lcs(S, T, i, j + 1, memo));
    return memo[i][j];
}
// Driver Code

int main()
{
    string S = "abcde";
    string T = "acek";

    N = S.length();
    M = T.length();

    // table for memoization

    int memo[N][max_value];

    // initializing memo table with -1

    for (int i = 0; i < N; i++)
        for (int j = 0; j < M; j++)
            memo[i][j] = -1;

    cout << "Length of longest common subsequence " << lcs(S, T, 0, 0, memo);
    return 0;
}

Output:

Length of longest common subsequence 3

Time Complexity: We can see that the time complexity of the DP and memoization approach is reduced to O(m*n), where m and n are the lengths of the given strings.

Space Complexity: Since we use an array of size (m*n), the space complexity turns out to be O(m*n).

Finding LCS using Tabulation:

The same optimal substructure can be used to convert the above code for finding the longest common subsequence using memoization to finding the longest common subsequence using tabulation or iteratively.

Code:

/* Dynamic Programming C++ implementation of LCS problem */
#include<bits/stdc++.h>
using namespace std;

int max(int a, int b);

/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n )
{
int L[m + 1][n + 1];
int i, j;

/* Following steps build L[m+1][n+1] in
bottom up fashion. Note that L[i][j]
contains length of LCS of X[0..i-1]
and Y[0..j-1] */
for (i = 0; i <= m; i++)
{
for (j = 0; j <= n; j++)
{
if (i == 0 || j == 0)
L[i][j] = 0;

else if (X[i - 1] == Y[j - 1])
L[i][j] = L[i - 1][j - 1] + 1;

else
L[i][j] = max(L[i - 1][j], L[i][j - 1]);
}
}

/* L[m][n] contains length of LCS
for X[0..n-1] and Y[0..m-1] */
return L[m][n];
}

/* Utility function to get max of 2 integers */
int max(int a, int b)
{
return (a > b)? a : b;
}

// Driver Code
int main()
{
char X[] = "abcde";
char Y[] = "acek";

int m = strlen(X);
int n = strlen(Y);

cout << "Length of longest common subsequence "
<< lcs( X, Y, m, n );

return 0;
}

Output:

Length of longest common subsequence 3

Time Complexity:  O(m*n), where m and n are the lengths of the given strings.

Space Complexity: Since we use an array of size (m*n), the space complexity turns out to be O(m*n).

Frequently Asked Questions

What is the longest common subsequence? Give an example?

Given two sequences, find the length of the longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order but is not necessarily contiguous. For example, “abc,” “abg,” “bdf,” “aeg,” ‘”acefg,”.. etc. are subsequences of “abcdefg.”

How do you find the longest subsequence?

The Longest Increasing Subsequence (LIS) problem is to find the length of the longest subsequence of a given sequence. All elements of the subsequence are sorted in increasing order. For example, the length of LIS for {10, 22, 9, 33, 21, 50, 41, 60, 80} is 6 and LIS is {10, 22, 33, 50, 60, 80}.

What do you mean by LCS in DAA?

A subsequence of a given sequence is just the given sequence with some elements left out. Given two sequences X and Y, we say that the sequence Z is a common sequence of X and Y if Z is a subsequence of both X and Y.

Which problem can be solved using the longest subsequence problem?

Both recursion and dynamic programming can be used to solve the longest subsequence problem.

Key Takeaways

In this blog, we have discussed:

  •  Finding the longest common subsequence using recursion and memoization methods. 
  • We have also analyzed the time and space complexity of both methods. 
  • To apply the memoization method, we used dynamic programming, which helps in reducing the time and space complexity. 

Practice more sums on the longest common subsequence here and share with your friends.