Understanding Naive String Searching

Understanding Naive String Searching
Understanding Naive String Searching

String searching or matching is you’ll be given a pattern and you have to look for that pattern in the given string if it exists or not. In other words, matching all occurrences of a pattern in a given string but the only thing to keep in mind is that the pattern should be consecutive (substring) in the given string.

Example: S: a b c d a b g | P: b c d

Now if you look for the pattern “bcd“ in the string then it exists. There are five major string matching algorithms:

  • Naïve Algorithm (Brute Force)
  • KMP Algorithm
  • Rabin-Karp Algorithm
  • Z Algorithm
  • Boyer-Moore String Search Algorithm

But Naïve algorithm is more of a brute force approach rather than an algorithm. It is also simplest among the other algorithms and easy to implement. Although it might be possible that this algorithm doesn’t work every time or for solving problems in competitive programming because of its time complexity but if we don’t think for time complexity and think just for a solution then this is the first thing or a normal approach which would come in mind as the name suggests.

How does the Naïve algorithm work?

In naïve algorithm the first pattern checks its character at ith position with the character present in string then checks for the further characters. It sounds a bit confusing but it isn’t. The naïve approach checks for all the possible rearrangement of Pattern P (length of pattern) [1…….m] comparing to String T (length of the string) [1……n]. We try iterating through i = 0, 1…….n-m, successively and for each iteration, i. Compare T [i+1…….i+m] to P [1……m]. It returns true if the string is found and false if not.

Advantages:

  • The comparison of the pattern with the given string can be done in any order
  • There is no extra space required
  • The most important thing that it doesn’t require the pre-processing phase, as the running time is equal to matching time

Disadvantages:

The one thing that makes this algorithm inefficient (time-consuming) is when it founds the pattern at an index then it does not use it again to find the other index. It again starts from the beginning and starts matching the pattern all over again. So, it doesn’t use information from the previous iteration.

Python Implementation of Naïve Algorithm:

def Naïve_search(P,S):

#Storing the length of pattern and string

m= len(P)
n= len(S)

#for iterating through the string only up to difference of length of pattern and string

for I in range(m-n+1):

#intially we make a variable of bool type storing True

isFound= True

#for iterating through the pattern upto its length

for j in range(n):

#if we don’t match the character then change bool to false and break

if (s[i+j]!= p[j]):
isFound=False
break

#if the pattern matches or exist in the string then return true

if isFound==True:
return True

#If we have already iterated through the whole string and doesn’t found a match then return False

return False

The worst case is when both, pattern and string have the same form ( P = am and T= an ) because it is a must to check m characters n-m+1 times. This leads to worst-case running time O((n-m+1)m) and this is tight upper bound. But luckily, this thing happens rarely because most of the time strings find a mismatch at the first character of the pattern. That’s why, in practice, the average running time is more appropriate for estimating running time.

In Java, the function chart is used to point at a specific character or position. Using this function, a comparison between the current characters of the pattern and the text is performed. If there is a mismatch, the pattern is shifted to the next character; otherwise, it returns the suitable shift of the pattern matching.

Optimised Naïve Algorithm:

We have discussed the naïve algorithm above and seen how it works but if we think of the case when all the characters present in the pattern are different. Then? In the original naïve algorithm we were sliding the pattern by 1. When all the characters are different then we can slide the pattern by more than 1. Interesting isn’t it? Although we’ll not be able to achieve big improvement by doing this but by this we can reduce the no. of matches or iteration it has to go through the whole pattern matching. When a mismatch occurs after j matches, now we know that the first character of the pattern will not match the j matched characters because all characters of the pattern are different. So we can slide the pattern j without missing any valid shifts.

def search(P, S):
m = len(P)
n = len(S)
i = 0

while i <= n-m:

#For current index i, check for pattern match

for j in range(m):
if S[i+j] != P[j]:
break
j += 1

if j==m: # if P[0…M-1] = S[i,i+1,…i+M-1]
print (“Pattern found at index ” + str(i))
i = i + m
elif j==0:
i = i + 1
else:
i = i+ j # slide the pattern by j

Standard Problems on Pattern Searching:

  • Longest Prefix which is also a suffix
  • Longest Palindromic  Substring
  • Longest Repeated Substring
  • Splitting a Numeric String
  • Wildcard Pattern Matching
  • Longest Common Substring
  • Anagram Substring Search (or search for all permutations)
  • Replace a word with asterisks in a sentence

Here you can also see that the Brute-Force Algorithm (Naïve) has worst time complexity comparing to other algorithms. Naïve algorithm is good for interview purposes as the interviewer is generally not asking you to give the optimal solution but looking forward so you could solve a problem and then maybe optimise it later. But as all we know that we should not only know how to solve a problem but solve that problem in minimum time is also important and if we talk about today in coding competitions or competitive programming you might not be able to solve the problem with naïve approach.

Here you can also see that the Brute-Force Algorithm (Naïve) has worst time complexity comparing to other algorithms. Naïve algorithm is good for interview purposes as the interviewer is generally not asking you to give the optimal solution but looking forward so you could solve a problem and then maybe optimise it later. But as all we know that we should not only know how to solve a problem but solve that problem in minimum time is also important and if we talk about today in coding competitions or competitive programming you might not be able to solve the problem with naïve approach.

So, To overcome every coder’s nightmare i.e. T.L.E (Time Limit Exceeded) you might want to learn the advanced algorithms which can help you solving the time complexity. For this, you might want to get yourself enrol in one of the courses for data structures at Coding Ninjas or practice at our platform Codezen for more such problems.

By Yogesh Kumar