Rabin Carp
Posted: 4 Feb, 2021
Difficulty: Moderate
You are given a string ‘str’ of length 'N' and a string ‘pat’ of length 'M'. Your task is to find the starting index of all the occurrences of ‘pat’ in str.
You need to return a list of integers that denote the indices from which ‘pat’ is present in ‘str’(consider 0 based indexing).
For example,
Let str= “AABAACAADAABAABA”
And pat= “AABA”
We will return the array/list [0,9,12] as we can clearly see that from indices 0 9 and 12 we have the same pattern ‘pat’ in ‘str’
Note
1. 'str' and 'pat' will consist of only uppercase English letters.
2. Two occurrences of a pattern may overlap with each other. For example, for str = "AAAA" and pat = "AA", you need to return [0,1,2] and not [0,2].
Input Format:
The first line of input contains an integer ‘T’ denoting the number of test cases to run. Then the test cases follow.
The first line of each test case contains the string ‘str’.
The second line of each test case contains the string ‘pat’
Output Format
For each test case, return a list of integers denoting the indices where the match happens.
If no match happens, return an empty array.
Output for each test case will be printed in a new line.
Note:
You do not need to print anything; it has already been taken care of. Just implement the given function.
Constraints:
1 <= T <= 100
1 <= N <= 5000
1 <= M <= 5000
1<= M <= N
'str' and 'pat' will consist of only uppercase english letters.
Where ‘T’ is the total number of test cases, N is the length of 'str' and 'M' is the length of 'pat'.
Time limit: 1 second
Approach 1
The key idea in solving this problem using this approach is that we can naively search for pattern ‘ptr’ in ‘str’ by simply iterating through the ‘str’. If we find a match we add the current index in our array/list ‘result’.
The algorithm will be -
- We start iterating from the start of the string ‘str’ to (size of ‘str’ - size of ‘pat’) to check for a match.
- In each iteration, we define a variable ‘j’ which and take a loop from ‘i’ to ‘i+ size of ‘pat’) to match the pattern string with the current string from index ‘i’. If the current substring from index ‘i’ matches the pat string, we add ‘i’ to our array/list ‘result’.
- Finally, we return the array/list ‘result’.
Approach 2
We can optimize the brute force approach by improving the searching algorithm. We can use hashing for it.
The algorithm will be -
- We first create a hash function. A hash function is simply a function that returns a value if some string is passed to it and it returns the same value when the same string is passed to it.
- For this, we can take a sufficiently large base and use the numeric equivalent of characters i.e 1=A 2=b 3=C and so on.
- We can take any base but let us assume we take it as 31.
- For example, the string ABC will have its hash value: 3*31^3 +2*31^2+1*31=902.
- Now for a larger string, it is possible that the hash value becomes very big, So we take a suitable mod to avoid collisions. A collision happens when the hash function gives the same value for 2 different strings. Let us that that to be 998244353. After taking such a large mod, the probability of collision will be 1/mod.
- Now to check if the pattern matches we just need to compare the hash values.
- We can calculate the hash value from 0 to any index i using the hash value calculated from 0 to i - 1:
- hash( str[0 .. i] ) = ( hash(str[0…(i - 1)]) + (str[i] - ‘A’ + 1) * power(base, i)) mod q
- Where q is any prime number.
- power(base, i) represents the value when the base is raised to i.
- hash( str[0 .. i] ) = ( hash(str[0…(i - 1)]) + (str[i] - ‘A’ + 1) * power(base, i)) mod q
We will use the following approach:
- We first make an array ‘primePower’ in which we store all the powers of the prime number till ‘n’ where ‘n’ is the size of ‘str’
- Then we find the hash value of ‘pat’ string ‘hashPattern’ as discussed above.
- We store the hash of str till index ‘i’ in an array/list h.
- Now, for each i from 0 to N - M + 1:
- We calculate the hash of substring of length M starting at index i by (h[i + M] - h[i] + mod)%mod.
- If the calculated hash value is equal to (hashPattern *primePower[i])%mod, we include the index i in our array/list 'occurrences'.
- Finally, we return the occurrences array.
SIMILAR PROBLEMS
Most Frequent Element
Posted: 25 Feb, 2022
Difficulty: Easy
Shortest Common Supersequence
Posted: 4 Mar, 2022
Difficulty: Hard
String Sort
Posted: 13 Apr, 2022
Difficulty: Easy
Change Case
Posted: 14 Apr, 2022
Difficulty: Easy
Reverse String
Posted: 15 Apr, 2022
Difficulty: Moderate