Shortest Common Supersequence

Deeksha
Last Updated: May 13, 2022

Introduction.

In this blog, we are going to discuss the approaches for the question Shortest common supersequence. Let’s see the problem statement first: You are given two strings s1 and s2, You have to find the length of the shortest string that can have both strings s1 and s2 as its subsequences.

 

Let’s understand it better using the example below: 

 

string s1 = “abac”,   string s2 = “cab” 

 

Now your output should be 5 because the string that can accommodate both as its subsequence will be “cabac”.I hope you must be clear with the question. So, let’s move towards solving this problem:-

Approach#1 Using Recursion

Let’s consider two substrings of length m and n respectively. Now we can have two situations:-

 

  • Both the strings are ending with the same element.

So for finding their shortest common supersequence what we have to do is that we will shorten each string by removing the last element. Then we will be finding the shortest common supersequence of shortened strings and then the removed element will be appended to the shortest common supersequence, that is 

 

SCS(A[1…..m], B[1…...n]) = SCS(A[1….m-1], B[1...n-1]) + A[m]

 

  • Suppose two given strings are not ending at the same character:

In this situation the shortest common supersequence will be the shorter of the two sequences SCS(A[1...m-1], B[1..n]) + A[m] and SCS(A[1...m], B[1...n-1]) + B[n].

 

Now let’s see this approach in code:

C++ code

 

#include <iostream>

#include <string>

using namespace std;

 

/*

    Function to find the length of the shortest common supersequence of

    Sequences

*/

int ShortestLength(string A, string B, int m, int n) {

 

    /*

        If the end of either sequence is reached, return

        the length of another sequence.

    */

    if (m == 0 || n == 0) {

        return n + m;

    }

 

    // If the last character of string A and B matches

    if (A[m - 1] == B[n - 1]) {

        return ShortestLength(A, B, m - 1, n - 1) + 1;

    } else {

 

        // Otherwise, if the last character of A and B don't match

        return min(ShortestLength(A, B, m, n - 1) + 1,

            ShortestLength(A, B, m - 1, n) + 1);

    }

}

 

int main() {

    string A;

    string B;

    cin >> A >> B;

 

    int m = A.length(), n = B.length();

 

    cout << "The length of the shortest common supersequence will be: " <<

        ShortestLength(A, B, m, n);

 

    return 0;

}

 

Input: 

ABCBDAB

BDCABA

 

Output:

The length of the shortest common supersequence will be: 9

 

Complexity Analysis: 

Time Complexity: The time complexity will be O(2 ^ (m + n)) where ‘m’ and ‘n’ refer to the length of two strings. Since we are making two recursive calls at each step hence this sums up to exponential time complexity

 

Space Complexity: The space complexity due to the recursive class will be O(m * n).

 

This recursive approach for getting the shortest common supersequence exhibits overlapping subproblems.Suppose we have two strings of length 6 and 8 respectively. Let’s see how many overlapping recursive calls are there :- 

 So to eliminate these overlapping recursive calls, we have to solve it using dynamic programming.

Approach#2 Using Memoization

As you have noticed that we are making many overlapping recursive calls which are costing us in time complexity. So as a solution to this problem we will be taking a 2-D grid for storing our answers. Now we will not be making a recursive call again if this 2-D array is already having the answer to that call. Rest all the steps will be the same as approach 1. Let’s try to give it a shot by yourself then we’ll  be sharing our solution:

 

C++ code:

#include <iostream>

#include <string>

using namespace std;

 

/*

    Function to find the length of the shortest common supersequence of

    sequences.

*/

int ShortestLength(string A, string B, int m, int n, int ** mem) {

 

    /*

        If the end of either sequence is reached, return

        the length of another sequence

    */

    if (m == 0 || n == 0) {

        return n + m;

    }

 

    // Check if answer of this call already present

    if (mem[m][n] != -1) {

        return mem[m][n];

    }

 

    // If the last character of string A and B matches

    if (A[m - 1] == B[n - 1]) {

        mem[m][n] = ShortestLength(A, B, m - 1, n - 1, mem) + 1;

        return mem[m][n];

    }

     else {

 

        // Otherwise, if the last character of A and B don't match

        mem[m][n] = min(ShortestLength(A, B, m, n - 1, mem) + 1,

            ShortestLength(A, B, m - 1, n, mem) + 1);

        return mem[m][n];

    }

}

 

int main() {

    string A;

    string B;

    cin >> A >> B;

 

    int m = A.length(), n = B.length();

    int ** mem = new int * [m + 1];

 

    //Initialising the mem array

    for (int i = 0; i <= m; i++) {

        mem[i] = new int[n + 1];

        for (int j = 0; j <= n; j++) {

            mem[i][j] = -1;

        }

    }

    cout << "The length of the shortest common supersequence will be: " <<

        ShortestLength(A, B, m, n, mem);

 

    return 0;

}

 

Input: 

ABCBDAB

BDCABA

 

Output:

The length of the shortest common supersequence will be: 9

 

Complexity Analysis:

Time Complexity: The time complexity of this approach will be O(M * N), where ‘M’ and ‘N’ refer to the length of the given strings.

 

Space Complexity : The space complexity will be O(M * N), where ‘M’ and ‘N’ refer to the length of the given strings.We are consuming this space in storing the results of smaller subproblems.

Approach#3 Dynamic Programming

In this approach, we will be calculating smaller values of ShortestCommonSupersequence(i, j) first and then we will be building larger values from it.

 

You will understand all cases clearly by going through below table:

 

ShortestCommon[i][j] =

j

if(i == 0)

ShortestCommon[i][j] =

i

if(j == 0)

ShortestCommon[i][j]=

ShortestCommon[i-1][j-1] + 1

if(A[i-1] == B[j-1])

ShortestCommon[i][j] =

min(ShortestCommon[i-1][j] +1, ShortestCommon[i][j-1] +1),            

if(A[i-1] != B 

                                           

So here I have discussed all the cases. Now try coding it out on your own.

 

C++ Code

 

#include <iostream>

#include <string>

using namespace std;

 

// Function to find the length of the shortest common supersequence

int ShortestLength(string A, string B) {

    int m = A.length(), n = B.length();

 

    // dp array stores solution to already computed subproblems

    // `dp[i][j]` stores the length of SCS of substring `A[0…i-1]` and `B[0…j-1]`

    int dp[m + 1][n + 1];

 

    // Initialize the first column 

    for (int i = 0; i <= m; i++) {

        dp[i][0] = i;

    }

 

    // Initialize the first row 

    for (int j = 0; j <= n; j++) {

        dp[0][j] = j;

    }

 

    // Fill the dp in a bottom-up manner

    for (int i = 1; i <= m; i++) {

        for (int j = 1; j <= n; j++) {

 

            // If the current character of A and B matches

            if (A[i - 1] == B[j - 1]) {

                dp[i][j] = dp[i - 1][j - 1] + 1;

            }

 

            // Otherwise, if the current character of A and B don't match

            else {

                dp[i][j] = min(dp[i - 1][j] + 1, dp[i][j - 1] + 1);

            }

        }

    }

 

    return dp[m][n];

}

 

int main() {

    string A;

    string B;

    cin >> A >> B;

 

    int m = A.length(), n = B.length();

 

    cout << "The length of the shortest common supersequence will be: " <<

        ShortestLength(A, B, m, n);

 

    return 0;

}

 

Input: 

ABCBDAB

BDCABA

 

Output:

The length of the shortest common supersequence will be: 9

 

 

Complexity Analysis: 

Time Complexity: The time complexity of this approach will be O(m * n), where ‘m’ and ‘n’ refer to the length of the given strings.

 

Space Complexity: The space complexity will be O(m * n), where ‘m’ and ‘n’ refer to the length of the given strings. We are consuming this space in storing the results of smaller subproblems.

 

Frequently Asked Questions

 

Ques 1: What is a subsequence of a string? 

Ans 1. A subsequence of a string is obtained by deleting one or more characters from the original string.

 

Ques 2. Why do we use dynamic programming? 

Ans 2. When we solve a problem with recursion, then generally, we make overlapping recursion calls which contribute to poor time complexity. Hence for escaping from these overlapping sub calls, we prefer to write code using DP.

 

Ques 3. Can I solve this problem in less than O(N * M) time complexity? 

Ans 3. No! You can’t because you can see that even in the most optimized approach that is using dynamic programming you need to visit the 2-D matrix at least once.

Key Takeaways: 

In this blog, we discussed two approaches for the problem of the shortest common supersequence. You can see how we started from exponential time complexity and then moved on to optimizing it. Remember this is how you should present your solution in an interview.

If you want to practice some super important coding problems for your interview, then I have an awesome platform CodeStudio for you. Keep practicing and growing. Happy Coding!

 

By: Deeksha Sharma 

 

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think