Longest Repeating Subsequence

Riya
Last Updated: May 13, 2022

Introduction-  

This blog will discuss the various approaches to solve the Longest Repeating Subsequence problem. Before jumping into the problem, let’s first recall what a subsequence is and then understand the Longest Repeating Subsequence problem.

 

A subsequence of a string is generated by deleting zero, one, or more characters of the original string without disturbing the relative positions of the remaining characters.

 In this problem, we will be given a string. We have to find the length of the longest subsequence occurring at least two times (i.e., the subsequence is repeated in the given string)  in the given string such that the two subsequences shouldn’t contain any character which has the same index in the original string.

 

Let’s take an example to understand it further:

Let the given string be S= “XXYZYVWW” and assume 0-based indexing. 

In the above string, we can see that the repeating subsequences are ‘X’, ‘Y’,  and ‘W’, “XYW”. So, the longest repeating subsequence will be “XYW” and the length of the longest Repeating Subsequence will be 3. We can see that there are two occurrences of the subsequence “XYZ”, one will be the subsequence formed by 0th, 2nd, and 6th characters (shown in blue color in the table below), and the other will be the subsequence formed by 1st,  4th, and 7th characters (shown in red color in the table below)

 

X

X

Y

Z

Y

V

W

W

 

If we understand this question carefully, we can see that this question is just a variation of the Longest Common Subsequence Problem (LCS). Here, we have to find the length of the longest common subsequence of and where is the given string, with the condition that while checking for the same characters in the two strings, we have to take care that the two same characters shouldn’t be at the same index in the given string.

Brute Force Approach -

Let’s assume that the given string is S, and we have to find the Longest Repeating Subsequence in S. As discussed above, this problem is a modification of the Longest Common Subsequence Problem. Now this problem is reduced to find the LCS (SS) with the condition that not any character in the resulting subsequence should be at the same index in both the S, where LCS (s1, s2) is the function to find the length of the longest common subsequence of the two strings s1 and s2.

 

So, longestRepeatingSubsequence(S) = LCS (SS) with a restriction that the resulting subsequence shouldn’t contain any character with the same index in their corresponding original string.

 

We can understand this by taking an example:

Let S= “XXYZYVWW”

LCS(S, S) = S, as the whole string is the same but if we consider the restriction that is written above then, the two ‘X’s shown in red, two ‘Y’s shown in blue and two ‘W’s shown in orange will be in the common subsequence from the two strings. Thus the LCS(S, S) with the condition that not any character in the resulting subsequence should be at the same index in both the S,  will be “XYW”. We can see that each character in “XYW” is at different indexes in both the S as shown below.

S= “XXYZYVWW”

S= “XXYZYVWW

 

Thus the code for this problem will be similar to the LCS problem. We will be finding LCS(S, S) with the above described condition. So, we need to consider one by one each character in the two strings. (Please note that the two strings are same here)

So, on considering one by one each character in the two strings, two situations will arise:

  1. The two characters are not occurring at the same index in their respective strings and are identical.
  2. The two characters are either occurring at the same index in their respective strings or are different.

If the first condition arises, we will increase the longest repeating subsequence by one and move to the next character of both the strings. If the second condition occurs, we will take the maximum of the two cases - one by moving to the next character in the first string and the other by moving to the next character in the second string.

Algorithm -

Step 1. Create a recursive function ‘longestRepeatingSubsequence()’ that will accept the below parameters -

  1. ‘str’: The given string for which we have to calculate the length of the longest repeating subsequence.
  2. ‘n’: Length of the given string 
  3. ‘pos1’: The variable denoting the current character of ‘str1’.
  4.  ‘pos2’: The variable denoting the current character of ‘str2’.

 

Step 2. Then write the base case of the recursive function i.e. if ‘pos1’ or ‘pos2’ reaches the end of the string, then we can’t add more characters to the longest repeating subsequence, so return 0.

Step 3. Now check for the two conditions which are described in the brute force approach section and do accordingly.

 

C++ code:

 

#include <bits/stdc++.h>

using namespace std;

int longestRepeatingSubsequence(string str, int pos1, int pos2, int n)

 {

 

         /* 

            base case

            if pos1 or pos2 reaches n, it means there are no further characters

            to consider, so return 0 

        */        

         if(pos1 == n || pos2 == n) return 0;

 

         /*

            If the two characters are equal and not occurring at the same index,

            then increase the length of the longest repeating subsequence by 1

         */

         if(pos1 != pos2 && str[pos1] == str[pos2]) return 1+longestRepeatingSubsequence(str, pos1 + 1, pos2 + 1, n);

         else return max(longestRepeatingSubsequence(str, pos1 + 1, pos2, n), longestRepeatingSubsequence(str, pos1, pos2 + 1, n));

 }

 

int main()

  {

   string str="XXYZYVWW";

   cout<<"The length of the Longest Repeating Subsequence in the given  string is: "<<longestRepeatingSubsequence(str, 0, 0, 8)<<endl;

   return 0;

  }

 

 

Output:
The length of the Longest Repeating Subsequence in the given string is: 3

 

Algorithm Complexity: 

Time Complexity: O(2 ^ N)

In every recursive call ‘longestRepeatingSubsequence()’, we are making two more calls to the same function, so calls are increasing at a rate of 2 until ‘pos1’ or ‘pos2’ reaches N. There will be at most ‘2 ^ N’ calls in this code. Therefore the overall time complexity will be O(2 ^ N) where ‘N’ is the length of the given string.

Space Complexity: O(1) 

As we are using constant space, therefore the overall space complexity will be O(1).

Top-Down Approach -

In the above approach, there are a lot of duplicate calls for the same state ‘pos1’ and ‘pos2’, which leads to exponential complexity. So, we will use a dynamic programming algorithm to store the states to reduce the time complexity.

We can avoid calling a recursive function again for the same state by using a two-dimensional vector and storing the state in that vector. If we encounter the same state during execution, we will return from the vector instead of recomputing. This approach will drastically decrease the time complexity.  We will need to make some changes in the above recursive code to make it a top-down dynamic programming solution.

 

Algorithm -

Step 1. Create a two-dimensional vector ‘dp’ of size (N + 1) x (N + 1), where ‘N’ is the length of the given string. Initialize all the elements of the vector with ‘-1’ and call the ‘longestRepeatingSubsequence()’ function, which will now have one extra argument - two-dimensional vector ‘dp’ other than those that were there in the recursive solution.

We have taken the size N x N as pos1 and pos2 are varying from 0 to N.

Step 2. Base case will be if pos1 = N or pos2 = N, then return 0. 

Step 3. Check if the value is present in the vector’ dp’. If yes, then return it.

Step 4. If the value is not present in the ‘dp’ vector, then call the recursive function as in the brute force solution, but here we have to do one more thing: at each step, we have to store the calculated result in our ‘dp’ vector.

 

C++ Code:

 

#include <bits/stdc++.h>

using namespace std;

int longestRepeatingSubsequence(string str, int pos1, int pos2, int n, vector<vector<int>>& dp)

      {

 

          /*

              base case

              if pos1 or pos2 reaches n, it means there are no further 

              characters to consider, so return 0 

          */

          if(pos1 == n || pos2 == n) return 0;

 

          /*

              If this state is already computed, then directly return it from 

              the “dp” vector

          */

          if(dp[pos1][pos2] != -1) return dp[pos1][pos2];

   

          if(pos1 != pos2 && str[pos1] == str[pos2]) return dp[pos1][pos2] = 1 + longestRepeatingSubsequence(str, pos1 + 1, pos2 + 1, n, dp);

          else return dp[pos1][pos2] = max(longestRepeatingSubsequence(str, pos1 + 1, pos2, n, dp),longestRepeatingSubsequence(str,pos1,pos2+1,n,dp));

      }

 

int main()

  {

   string str = "XXYZYVWW";

   vector<vector<int>> dp(9,vector<int>(9,-1));

   cout<<"The length of the Longest Repeating Subsequence in the given string is: "     <<longestRepeatingSubsequence(str, 0, 0, 8, dp)<<endl;

   return 0;

  }

 

 

Output:
The length of the Longest Repeating Subsequence in the given string is: 3

 

Algorithm Complexity: 

Time Complexity: O(N ^ 2)

There will be ‘N * N’ calls in a recursive function as we are using a 2D vector, and if the value is present in this array, then we are not making duplicate calls, so there will be utmost N * N combinations. Therefore the overall time complexity is O(N ^ 2), where ‘N’ is the length of the given string.

 

Space Complexity: O(N ^2)

As we are using a 2D vector of size ‘N x N’, the overall space complexity is O(N^2), where N is the length of the given string.

Bottom-Up Approach -

In the above approach, the overhead of the internal stack is being used in recursion, and we can eliminate it by using an iterative approach. We will take a 2D array and will start populating it from the bottom ( 0th index ) to top ( Nth index ); hence by using this approach, we can make our algorithm more memory efficient. 

Algorithm -

Step 1. Create a function that will accept the below parameters:

  1. ‘str’: The given string for which we have to calculate the length of the longest repeating subsequence.
  2. ‘n’: Length of the given string.

Step 2. Create a 2D array ‘dp’ of size (N+1) x (N+1). 

Step 3. Run two for loops to consider all the characters one by one to find LCS(str, str) with the condition that no two characters in the resulting subsequence will have the same index in their corresponding original string, resulting subsequence will be our longest repeating subsequence.

Step 4: If i=0 or j=0, i.e if the length of the string will be 0, then dp[i][j]=0

Step 4: Then check if the two characters are not at the same index and are identical, then increase the length of the resulting subsequence by 1, else take the maximum of the two cases -  one by moving to the next character in the first string and the another by moving to the next character in the second string.

Step 5:  Return dp[N][N]

 

C++ Code:

 

#include <bits/stdc++.h>

using namespace std;

int longestRepeatingSubsequence(string str, int n)

{

    int dp[n+1][n+1];

    for(int i = 0; i <= n; i++)

      {

          for(int j = 0; j <= n; j++)

           {

               /*

                   if length of the string will be 0, then length

                   of longest repeating subsequence will be 0   

               */               

               if(i == 0 || j == 0) dp[i][j] = 0;

 

               /*

                   If the two characters are not at same position and are equal,

                   then we will increase the length of the longest repeating

                   subsequence by one, else we will take maximum of the two

                   cases as described in the algorithm section

               */                    

               else if(i != j && str[i-1] == str[j-1]) dp[i][j] = 1 + dp[i-1][j-1];

               else dp[i][j] = max(dp[i-1][j], dp[i][j-1]);

           }

      }

    return dp[n][n];

}

 

int main()

  {

   string str="XXYZYVWW";

   vector<vector<int>> dp(9, vector<int>(9, -1));

   cout<<"The length of the Longest Repeating Subsequence in the given string is:  " <<longestRepeatingSubsequence(str, 8) <<endl;

  return 0;

  }

 

 

Output:
The length of the Longest Repeating Subsequence in the given string is: 3

 

Algorithm Complexity: 

Time Complexity: O(N ^ 2)

There will be ‘N * N’ calls in a 2D array using the nested ‘for’ loop. Therefore the overall time complexity is O(N ^2), where N is the length of the given string.

Space Complexity: O(N ^ 2)

As we are using a 2D array of size ‘N * N’, the overall space complexity is O(N^2), where N is the length of the given string.

Frequently asked questions-

 

1) What is the Longest Common Subsequence Problem (LCS)?

  The longest common subsequence (LCS) is to calculate the length of the longest common subsequence, which is common in two given strings.

 

2) What is the Longest Repeating Subsequence Problem?

   The longest repeating subsequence is to find the length of the longest repeating subsequence in a given string such that the two subsequences don’t have the same original string character at the same position.

 

3) How the problem “Longest Repeating Subsequence” is similar to the “LCS” problem?

In the “LCS” problem, we have to find the longest common subsequence between two given strings, and in this problem “Longest Repeating Subsequence”, we have to find the longest repeating subsequence in a given string S, which we can find as the longest common subsequence between and S, with a condition that not any character in the resulting subsequence should be at the same index in the corresponding S. This has been illustrated by taking an example in the above Brute force section.

 

Key takeaways-

In this article, we discussed what the Longest Repeating Subsequence Problem is, discussed the various approaches to solve this problem programmatically, and discussed the time and space complexities. We find out how to use extra space and drastically reduce the time complexity from exponential to polynomial.  If you want to practice this problem, then you can visit codestudio.

 

 

If you think that this blog helped you, then share it with your friends!. Refer to the DSA C++ course for more information.

Until then, All the best for your future endeavors, and Keep Coding.


By Riya Kumari

 

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think