Distinct Subsequences
Posted: 21 Oct, 2020
Difficulty: Moderate
You have been given string 'S' of length 'N' that may contain duplicate alphabets. Your task is to return the count of distinct subsequences of it.
For example:
For the given string “deed” :
The possible subsequences are {“”}, {“d”}, {“e”}, {“de”}, {“e”}, {“de”}, {“ee”}, {“dee”}, {“d”}, {“dd”}, {“ed”}, {“ded”}, {“ed”}, {“ded”}, {“eed”} and {“deed”}.
As, {“d”}, {“e”}, {“de”}, {“ed”} and {“ded”} are repeated.
The distinct subsequences are {“”}, {“d”}, {“e”}, {“de”}, {“ee”}, {“dee”}, {“dd”}, {“ed”}, {“ded”}, {“eed”} and {“deed”}
Thus, the output will be 11.
Note:
As the answer can be large, return your answer modulo 10^9 + 7.
Input format:
The first line contains an integer 'T' which denotes the number of test cases or queries to be run. Then, the T test cases follow.
The first and only line of each test case or query contains the string 'S'.
Output format:
For each test case, print a single line containing the count of distinct subsequences modulo 10^9+7.
The output of each test case will be printed in a separate line.
Note:
You do not need to print anything, it has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 10
1 <= N <= 10 ^ 4
where 'T' is the number of test cases and 'N' is the length of the given string 'S'.
Time limit: 1 sec.
Approach 1
- The IDea is to call a helper function so as to create all possible subsequences of the string which will contain:
- String CUR = the possible subsequence of the string.
- Int ID = the index of character which will either be included or excluded.
- We maintain a set to count the distinct subsequences, each time a new subsequence is created, we add it into the set. Remember, the set does not contain duplicate elements.
- We return the size of the set which is the count of distinct subsequences received from the helper function.
Algorithm:
void helper(STR, set, ID, CUR) :
- Base Case: if ID >= STR.size()’, this means that CUR is a possible subsequence, thus, add CUR to set and return.
- Call helper(STR, set, ID+1, CUR), this is where you exclude the current character.
- Call helper(STR, set, ID+1, CUR + STR[ID]), this is where you include the current character.
int distinctSubsequences(S) :
- Initialise an empty set.
- Call helper function, helper(STR, set, 0, “”).
- Return the size of the set.
Approach 2
- The idea is to use dynamic programming.
- Take a ‘DP’ list which at every index stores the number of subsequences created by the given string which ends at that particular index. Mathematically, DP[id] will store the number of subsequences created by string ‘STR’ if the length of the string is id.
- We maintain a ‘MAP’ which stores the index of the latest occurrence of any particular alphabet in the string.
- Now, to calculate DP[id] the steps follow:
- If the character at index id has not appeared before then there will be no possibility for repetition in subsequence. Hence, DP[id] = 2 * DP[id - 1].
- Otherwise, we need to subtract the repetitions and as the last occurrence of the current character in the string is at the MAP[STR[id-1]]. So the count of subsequences till this index is repeated so we will subtract the count. Hence, DP[id] = 2 * DP[id - 1] - DP[ MAP[STR[id-1]].
- Update the latest index of the current index in the MAP.
- Return DP[N] where N is the size of the string.
For example, let’s take the string ”abbcb”
- Size of the given string is 5. Initialise a MAP to store the latest index of each character. Along with this also initialise a DP array DP[6]
- The base case, DP[0] = 1 This stores the number of subsequences that can be created if string size is 0 which is 1, i.e a null string.
- From index 1 to the length of the string:
- Iteration 1: STR[1-1], i.e ‘a’ occurs for the first time, DP[1] = 2 * DP[0]. Thus, DP[1] = 2. Also, MAP[a] = 0.
- Iteration 2: STR[2-1], i.e ‘b’ occurs for the first time, DP[2] = 2 * DP[1]. Thus, DP[2] = 4. Also, MAP[b] = 1.
- Iteration 3: STR[3-1], i.e ‘b’ is repeated, DP[3] = 2 * DP[2] - DP[ MAP[ b ]]. Thus, DP[3] = 2 * 4 - 2, DP[3] = 6. Also, MAP[b] = 2.
- Iteration 4: STR[4-1], i.e ‘c’ occurs for the first time, DP[4] = 2 * DP[3]. Thus, DP[4] = 12. Also, MAP[c] = 3.
- Iteration 5: STR[5-1], i.e ‘b’ is repeated, DP[5] = 2 * DP[4] - DP[ MAP[ b ]]. Thus, DP[5] = 2 * 12 - 4, DP[5] = 20. Also, MAP[b] = 4.
- The final count is stored in DP[5] i.e 20. The DP list is [1, 2, 4, 6, 12, 20].
Approach 3
- Take a variable, say ‘LEVELCOUNT’ which stores the count of subsequences ending at a particular character.
- Take another variable, say ‘ALLCOUNT’ which stores the count of total subsequences created yet.
- Take a ‘MAP’ which maps the character with its respective ‘LEVELCOUNT’.
- Now, if the string is not an empty string, let’s work on the very first character (C). The only subsequence ending with C will be “C”, thus,
- LEVELCOUNT becomes 1.
- In the same way, ALLCOUNT becomes 1
- Hash C with its LEVELCOUNT i.e. MAP[C] = 1.
- Iterate through the string:
- Update LEVELCOUNT with ALLCOUNT + 1.
- If the character has not appeared before then there will be no possibility for repetition in subsequence. Hence, ALLCOUNT = ALLCOUNT + LEVELCOUNT.
- Otherwise, we need to subtract the repetitions and as the MAP stores the number of repeated subsequences, hence, ALLCOUNT = ALLCOUNT + LEVELCOUNT - MAP[character].
- Return ALLCOUNT + 1 as we include empty subsequence as well.
For example, let’s take the string “ada”
- Let us take the very first character i.e. ‘a’. The number of subsequences ending with ‘a’ = 1 i.e (‘a’). Thus total subsequences = 1 (‘a’).
- Now, take the next character i.e. ‘d’. The number of subsequences ending with ‘d’ = 2 i.e (‘d’, ‘ad’) i.e. Total subsequences + 1. Thus total subsequences = 3 (‘a’, ‘d’, ‘ad’) i.e (Total subsequences) + (subsequences ending with ‘d’).
- Now, take the next character i.e. ‘a’. The number of subsequences ending with ‘a’ = 4 i.e (‘aa’, ‘da’, ‘ada’, ‘a’) i.e. Total subsequences + 1. Total subsequences = 7 (‘a’, ‘d’, ‘ad’, ‘aa’, ‘da’, ‘ada’, ‘a’) i.e (Total subsequences) + (subsequences ending with ‘a’), but, this time, the subsequences are repeated as the character ‘a’ is also repeated.
- Thus, the total subsequences will be 6 which is (total subsequences) + (subsequences ending with ‘a’) - (total subsequences that ended with previous occurrence of ‘a’). Finally, (‘a’, ‘d’, ‘ad’) + ( ‘aa’, ‘da’, ‘ada’, ‘a’) - (‘a’)
Note: We have excluded empty string here.
SIMILAR PROBLEMS
Max Non Adjacent sum
Posted: 10 May, 2022
Difficulty: Easy
Mario And His Princess
Posted: 12 May, 2022
Difficulty: Moderate
Left Rotate an Array by One
Posted: 17 May, 2022
Difficulty: Easy
Largest Element in the Array
Posted: 17 May, 2022
Difficulty: Easy
Matrix Boundary Traversal
Posted: 20 May, 2022
Difficulty: Easy