Pattern Searching using Trie

Pranav Gautam
Last Updated: May 13, 2022

Introduction

Rachel gave Ross an 18-page letter talking about her feelings. She asked Ross to read and understand all of it to talk about it the next day. Unfortunately, the letter was so long that Ross fell asleep. Look how peaceful he looks:

 

He’s in so much trouble now. Luckily, Ross has got you, his coder friend. Could you help him find the location of some particular patterns in this 18-page text?

 

To save Ross from Rachel, you need to write code for Pattern Searching using a Trie of all Suffixes.

 

Problem Statement

Given a text string for lookup, find the indices of all the occurrences for a given pattern string (if present) in the text string.

 

Input Illustration:

Text string: we were on a break!
Pattern string: on a break

 

Output Illustration:

on a break found at 8

 

Approach

How to store suffixes of a string? For linear-time search operation, suffixes of a string are stored in a suffix tree. Don’t worry. It’s not another data structure that you’ll have to learn. Suffix trees are implemented using tries. So, if you know the implementation of tries, you are good to go. Let’s look at the suffix tree for the string “yemeles”. By the way, “yemeles” means careless.

 

 

Consider the example given before. The baseText is  “We were on a break!”. What if the pattern is “re”? The suffix that starts with “re” is “re on a break!”. While following the path in the suffix tree, the pattern will end at ‘e’. However, the path goes till ‘!’. We can’t store the  pattern’s index at the last element of the current suffix as the pattern may be a substring of the suffix.

 

Hence, Store all the possible suffixes of the baseText string with their corresponding indices. For an input string, pattern, print indices in string baseText where pattern is present.

 

How to generate all possible suffixes? Iterate through the text string. Create a substring containing letters from the current index till the end of the string. For example, possible suffixes for the string “HUGSY” are as follows:

 

Structure of Trie Node

Every trie node contains two data members: indexes and next.

  • indexes is a vector of integers. It holds all the positions where the given substring is present.
  • next is a hashmap. It stores the address of a trie node corresponding to a character.

Functional Requirements

A good coder always ponders a particular question before jumping into coding. That question is: 
“What are the functionalities I want my code to provide?”. Like a good coder, let’s analyze the functions required for Pattern Searching using Trie. The suffix tree(implemented using trie) must have all the generated suffixes for the given text before the pattern search.

 

  1. insertSuffix():  The insertSuffix() method requires two arguments: suffix(substring of the textand index(position at which suffix is present). It will be responsible for inserting a suffix and corresponding index in a suffix tree.

 

The insertSuffix() method must be called by the constructor of the Trie class to perform this task. This method requires two arguments: suffix(substring of the textand index(position at which suffix is present).

 

  1. searchPattern(): The searchPattern() method will search the pattern in the suffix tree and find the indexes at which a given pattern is present. What about the edge case? The function should provide a message(e.g., “not found”) if the given pattern is absent in the suffix tree.

 

  1. print(): Using print(), the list of indexes at which the given pattern is present in the text is printed.

You might be wondering: why should we create another function when we could include this operation in the search() function? Actually, every method should perform only one task for the sake of modularity. This is what we are trying to do.

 

Procedure

insertSuffix()

  1. Set currentNode pointer to the root of Trie(referenced using this operator).
  2. Iterate through the suffix string using an iterator letter.
  3. If next pointer of currentNode for letter (currentNode->next[letter]) points to NULL, create a new Trie node.
  4. Set the currentNode to the next pointer of the currentNode for letter.
  5. Insert the index in the indexes vector of currentNode

searchPattern()

  1. Set currentNode pointer to the root of Trie(referenced using this operator).
  2. Iterate through the suffix string using an iterator letter.
  3. If next pointer of currentNode for letter (currentNode->next[letter]) points to NULL, print a warning(e.g., “not found”) and return.
  4. Set the currentNode to the next pointer of the currentNode for letter.
  5. Call print() function using currentNode as argument.

 

Code for Reference

#include <bits/stdc++.h>
using namespace std;


// Suffix tree implemented using Trie.
class Trie 
{


  // Data members.
  unordered_map<char, Trie *> next;
  vector<int> indexes;


public:
  // Trie Constructor to populate the suffix tree.
  Trie(string baseText)
  {
    int length = baseText.length();
    for (int currentIndex = 0; currentIndex < length; currentIndex++)
    {
      string suffix = baseText.substr(currentIndex);
      insertSuffix(suffix, currentIndex);
    }
  }


  Trie()
  {
    // Trie Constructor to create an empty Trie node.
  }


  void insertSuffix(string suffix, int currentIndex)
  {
    Trie *currentNode = this;
    for (auto letter : suffix)
    {
      if (currentNode->next[letter] == NULL)
      {
        currentNode->next[letter] = new Trie();
      }
      currentNode = currentNode->next[letter];
      currentNode->indexes.push_back(currentIndex);
    }
  }


  void searchPattern(string pattern)
  {
    Trie *currentNode = this;
    for (auto letter : pattern)
    {
      if (currentNode->next[letter] == NULL)
      {
        cout << "Pattern not found\n";
        return;
      }


      currentNode = currentNode->next[letter];
    }
    print(pattern, currentNode);


         
  }


  void print(string pattern, Trie * currentNode)
  {
    cout << pattern << " found at: \n";
    for (auto index : currentNode->indexes)
    {
      cout << index << " ";
    }


               cout << endl;
  }
};


// Driver Function.
int main()
{


  string baseText, pattern;
  int numOfPatterns;


  cout << "Enter the text:";
  getline(cin, baseText);


  // Trie object with name suffixTree.
  Trie *suffixTree = new Trie(baseText);


  cout << "Number of Patterns:";
  cin >> numOfPatterns;


  for (int i = 0; i < numOfPatterns; i++)
  {
    // Here two getline statements help in avoiding.
    // newline character as an input.


    cout << "Pattern: ";
    getline(cin, pattern);


               while (pattern.length() == 0)


            
    {


                     getline(cin, pattern);


                 
    }
    suffixTree->searchPattern(pattern);
  }


  return 0;
}

Input

Enter the text: we were on a break!
Number of patterns: 3
Pattern: we
Pattern: on a break
Pattern: Rachel

Output

we found at:
0 3
on a break found at:
8
Pattern not found

Complexity Analysis

Time Complexity: 

Can you guess the time complexity of both these methods? insertSuffix()and searchPattern() require iterating through the string. As the iteration is a linear-time operation, the time complexity of insertSuffix() and searchPattern() function is O(M), where ‘M’ is the length of the suffix string.

 

Please note that creating a suffix is also a linear-time operation. To create a suffix tree, we first generate suffixes and insert them in the trie. So, populating the suffix tree has the time complexity of O(N2), where N is the length of the string.

 

Space Complexity: 

The number of suffixes is equal to the length of the string(say N). Consider average word length of every suffix is M. The total space required to store N suffix of M lengths is N*M. So, the space complexity of the code is O(N*M).

 

Key Takeaways

Let’s hope Ross wakes up in time and uses our code for Pattern Searching. You did a great job in learning this new pattern searching method. Well Done!

 

Learning something new is never easy. Practicing what you’ve learned is even more difficult. But, a good coder never gives up. So head over to our practice platform CodeStudio to practice top problems and many more. CodeStudio also offers interesting interview experiences and blogs like this. So keep learning and Happy Coding!

Pranav Gautam

 

Was this article helpful ?
1 upvote

Comments

No comments yet

Be the first to share what you think