This blog will discuss the Rabin-Karp Algorithm. It is a string searching algorithm that is named after its authors Richard M. Carp and Michael O. Rabin.
This algorithm is used to find all the occurrences of a given pattern ‘P’’ in a given string ‘S’ in O(Ns + Np) time, where ‘Ns’ and ‘Np’ are the lengths of ‘S’’ and ‘P’, respectively.
Let’s take an example to make it more clear.
Assume the given string S = “cxyzghxyzvjkxyz” and pattern P = “xyz” and we have to find all the occurrences of ‘P’ in ‘S’.
We can see that “xyz” is occurring in “cxyzghxyzvjkxyz” at three positions. So, we have to print that pattern ‘P’ is occurring in string ‘S’ at indices 1, 6, and 12.
How Rabin-Karp Algorithm Works?
The algorithm starts by computing, at each index of the text, the hash value of the string starting at that particular index with the same length as the pattern. If the hash value of that equals the hash value of the given pattern, then it does a full match at that particular index.
How is Hash Value calculated in Rabin-Karp Algorithm?
Step 1. Create a function “rabinKarpSearch()’ for implementing the Rabin Karp algorithm, that will accept the two parameters - the given string ‘S’ and the given pattern ‘P’. First, calculate the lengths of ‘S’ and ‘P’.
Step 2. Now, we have to choose a prime number and a value for taking modulus while calculating the hash values. For minimizing the hashing collision, we have to take the value of the prime number close to the number of characters used in the string and pattern. Assuming that the given ‘S’ and ‘P’ consist of only lower alphabets, the total number of characters will be 26, so take prime = 31. Now a value for taking modulus should be very large and prime so, take mod = 1e +9.
Step 3. The hash function that we have used here is:
hash(S) = (Σ((S[i] - ‘a’ + 1) * (P^(i)))) % mod
Step 4. Create a vector to store the powers of “prime” and store (prime ^ 0) to (prime ^ Ns). Now calculate the hash value of the given pattern and the first window of the string ‘S’.
Step 5. Now one by one, slide the given pattern and calculate the hash value of the corresponding substring and compare it with the hash value of the pattern. If found the same, print the occurrence of the pattern at that index.
The given pattern occurs in the given string at index 1 The given pattern occurs in the given string at index 6 The given pattern occurs in the given string at index 12
“Rabin Karp” algorithm is a string searching algorithm used to find all the occurrences of a given pattern ‘P’’ in a given string ‘S’. You can check out this video for conceptual knowledge and implementation of code.
Rabin-Karp Algorithm Complexity
We will now discuss the time and space complexity of Rabin-Karp Alogrithm:
- In the Rabin Karp algorithm, we have calculated the hash value of the pattern in O(Np) time and traversed the given string for calculating the hash value and comparing the corresponding hash value with that of the pattern in O(Ns) time.
- So, the time complexity is O(Ns + Np), where ‘Ns’ and ‘Np’ are the lengths of the given string and pattern respectively.
- We have used constant space. So, the space complexity is O(1).
Rabin-Karp Algorithm Applications
The Rabin-Karp algorithm is a string search algorithm. It efficiently finds occurrences of patterns within a given text. Its primary use is pattern matching and string searching, and it has several advantages that make it a valuable tool in many areas. Here are some of its uses:
- Word Processing: Used by search engines and text editors to find and highlight occurrences of keywords and phrases within large bodies of text.
- Plagiarism Detection: Used to identify instances of copied content within a document, website, or scholarly article.
- Biological Sequence Analysis: Used in bioinformatics to search and match DNA, RNA, or protein sequences in genomic databases.
- Data Mining: It is used for pattern matching and similarity search on large datasets.
- Computer Security: Implemented in intrusion detection systems and antivirus software to identify and block malicious patterns and signatures.
- Compression Algorithm: It is used to search for repeating patterns and substrings and can be compressed more effectively.
- Image Processing: It has been adapted for image recognition tasks. B. Find specific patterns in images.
- Network Packet Inspection: Used in network security to identify specific patterns or signatures in network packets.
- Spelling Correction: Used by spell checking systems to suggest corrections based on similar patterns in the text.
- Data Deduplication: Used in data storage systems to eliminate duplicate data and optimize storage capacity.
Advantages and Disadvantages Of Rabin-Karp Algorithm
- Rabin-Karp algorithm is best suited to find multiple patterns in the same text.
- Rabin-Karp algorithm can work with various types of data like common characters in the same input, multiple substrings, etc.
- Rabin-Karp algorithm helps in detecting plagiarism for large datasets.
- The algorithm can also be used in string-matching questions when used with hash functions.
- The Rabin-Karp algorithm can have the worst time complexity when frequent hash collisions occur. The complexity can go to O(M*N) which is not an optimized complexity when compared with different strung matching algorithms.
- Rabin-Karp algorithm uses extra space to store hash value data.
- Rabin-Karp algorithm uses predictability of the hash function, which is a security concern.
- Cryptographic applications do not prefer to use the Rabin-Karp algorithm, because of safety reasons.
Frequently asked questions
What is the Rabin with Karp algorithm?
Rabin-Karp algorithm is a string searching or matching algorithm. It uses a hash function to match patterns. If the hash value of a string matches the hash value of the pattern, then only it goes for a full match for that particular string.
What are the advantages of the Rabin-Karp algorithm?
Rabin-Karp algorithm is a good algorithm to check plagiarism as it can deal with multiple patterns at the same time. That makes it the best algorithm to detect plag, even for big phrases. With a good hashing function, it can be quite efficient for string matching.
How do you use Rabin-Karp algorithm?
The Rabin-Karp algorithm searches for a pattern in a text using hashing. It slides a window along the text, calculates hash values for the pattern and window, and compares them. If hashes match, it checks the substrings for an exact match.
What is the Rabin-Karp algorithm for numbers?
The Rabin-Karp algorithm for numbers operates similarly to text, but uses number sequences instead. It employs rolling hash functions to identify a specific sequence of numbers within a larger numeric dataset.
In this article, we discussed the Rabin-Karp Algorithm for finding a given pattern in a given string, the Java, Python & C++ implementation of the algorithm, and its time and space complexities. If you want to check out more articles and solve similar problems for practice, then you can visit Coding Ninjas Studio
If you think that this blog helped you, then share it with your friends!.
Until then, All the best for your future endeavors, and Keep Coding.