Pattern Searching using a Trie of all Suffixes

Malay Gain
Last Updated: May 13, 2022

Introduction

Trie is a special type of search tree data structure that is used as an efficient retrieval or searching data structure as searching time can be reduced to the optimal limit of pattern length. Here we will see one of the important implementations of trie for pattern searching. 

But we will use standard trie of all suffixes to implement the pattern searching algorithm.

Problem statement

You are given a text or string. You need to find all occurrences of given queries of patterns in the text.

Input
txt="minimize"
pattr="mi"

Output
Pattern found at position 0
Pattern found at position 4

Note: Please try to solve the problem first and then see the solution below.

Approach

Step 1: building trie with all suffixes of the given text

  • First, generate all suffixes of the given string
  • Then, build the trie considering every suffix as individual words.


Step 2: pattern searching in trie

  • Start from the root of the trie and the first character of the given pattern.
  • If there is an edge from the root for the current character, follow the edge.
  • Thus following the edges one by one for the current character if all characters of the given pattern are processed, the pattern is present in the text. We will store indexes of the suffixes starting at the current node. 
  • If there is not any edge for the current character of the pattern, then the pattern is not present in the given text.

Dry run

Let the given string or text be “minimize”.

All possible suffixes of the given text :

  1. minimize 
  2. inimize
  3. nimize
  4. imize
  5. mize
  6. ize
  7. ze
  8. e

source

Code

//c++ implementation of pattern Searching using a Trie of all Suffixes
#include<iostream>
#include<list>
#define MAX_CHAR 256
using namespace std;

//node of trie with all suffixes
class TrieNode
{
       private:
              TrieNode *children[MAX_CHAR];
              list<int> *indexes;
       public:
              // constructor
              TrieNode() 
              {
                     // Creating an empty linked list for storing indexes of
                     // suffixes starting from the current node
                     indexes = new list<int>;

                     
                     for (int i = 0; i < MAX_CHAR; i++)
                     {
                            children[i] = NULL; // initializing all child pointers as NULL
                     }
              }

              //  function to insert a suffix of the given txt
              // in subtree rooted with this node
              void insertSuffix(string suffix, int index)
              {
                     // Store index in linked list
                     indexes->push_back(index);

                     // If string has more characters
                     if (suffix.length() > 0)
                     {
                            
                            char cIndex = suffix.at(0);// first character

                            //add a new edge if there is no edge for this character
                            if (children[cIndex] == NULL)
                                   children[cIndex] = new TrieNode();

                            // Recur for next suffix
                            children[cIndex]->insertSuffix(suffix.substr(1), index+1);
                     }
             }

              // A function for the pattern searching in subtree rooted
              // with this node which returns pointer to a linked
              // list containing all indexes where pattern is present.
              
              list<int>* search(string pat)
              {
                     // return the list of indexes if all characters of pattern is processed,
                     if (pat.length() == 0)
                            return indexes;

                     // if there is an edge for the current character, then follow the edge.
                    
                     if (children[pat.at(0)] != NULL)
                            return (children[pat.at(0)])->search(pat.substr(1));

                     // pattern doesn’t exist in  the text If there is no edge, 
                     else return NULL;
              }

              
};

//  trie of all suffixes
class TrieOfSuffixes
{
       private:
              TrieNode root;
       public:
              // constructing a trie of suffixes of the given text(constructor)
              TrieOfSuffixes(string txt)
              {
                     // inserting all suffixes into the Suffix Trie 
                     //using the recursive function
                    
                     for (int i = 0; i < txt.length(); i++)
                            root.insertSuffix(txt.substr(i), i);
              }

              // Function for searching a pattern in this suffix trie.
              void search(string pat)
              {
                     
                     
                     list<int> *result = root.search(pat);// calling  search function for root of Trie.

                     //  paattern not matched if the list of indexes is empty 
                     if (result == NULL)
                            cout << "Pattern not found" << endl;
                     else
                     {
                    list<int>::iterator i;
                    int patLength = pat.length();
                    for (i = result->begin(); i != result->end(); i++)
                    {
                     cout << "Pattern found at position " << *i - patLength<< endl;
}
                            
                     }
              }

};

// driver code
int main()
{
// suffix trie for text "minimize"
string txt = "minimize";
TrieOfSuffixes S(txt);

cout << "Search for 'mi'" << endl;
S.search("mi");

return 0;
}

Output

Search for 'mi'

Pattern found at position 0

Pattern found at position 4

 

 

 

 

Complexity

Time complexity is O(n+k), where n is the length of the pattern and k is the number of occurrences of the pattern.

FAQs

  1. What is the Trie?
    Trie is a special type of search tree data structure that is used as an efficient retrieval or searching data structure as searching time can be reduced to the optimal limit of pattern length.
     
  2. What are some applications of the Trie data structure?
    Trie is used in many string search-based applications such as autocomplete, prefix matching, text search.

Key Takeaways

This article covered how to implement the pattern searching algorithm using the trie of suffixes.

Check out the CodeStudio library for getting a better hold of the data structures and algorithms.
Side by side, you can also practice a wide variety of coding questions commonly asked in interviews in CodeStudio. Along with coding questions, you can also find the interview experience of scholars working in renowned product-based companies here. 

Happy learning!

Was this article helpful ?
0 upvotes