Dictionaries using Tries
Introduction
Abbas and Hatim were two friends who lived in Yemen. Once, they visited India to see the Taj Mahal. However, they were not able to understand Hindi since they only understood Arabic. So they tried to create their dictionary software. “How would we implement this dictionary, Hatim?” asked Abbas. “Should we use a hashmap?” No, Hatim said, we will implement our dictionary using tries.
Abbas wondered, “What in the world is a trie?” Hatim then explained the concept of tries to Abbas.
Trie is a tree-based data structure used to store words in a dictionary. Let us consider that the number of letters in the English language is twenty-six if we only consider lowercase letters. Then each node of a trie will contain a character of the word and will contain a children array. It can have a maximum of 26 children.
Consider the below given example.
The words that we have stored in the above trie are as follows:
And, Are, Bat, Do, Dot, Drum, No, Not, Note, Null. |
Please note that by storing words in a trie, we can save a lot of space. For example, if 10000 words start with the character ‘a’, we store it just one time. Again if 1000 words start with “ba,” then we are storing it just once. Thus it saves a lot of space for us.
Trie Operations:
Let us continue with our previous example.
- Insert: Suppose we want to insert the word “Cricket” in the above trie. First, we will check if the start node has a child with the value ‘C’.
- If the start node doesn’t contain the first character of the word, this means there is no word starting with ‘C’. So will attach the entire word in the trie. To do this, first, we will make ‘C’ as a child of the root node, then ‘r’ as a child of ‘C’, then ‘i’ as the child of ‘r’, and so on. Finally, we will mark ‘t’ as a terminal node to mark it as the end of the word. That is why we will mark it red.
- Else if the start node contains the character ‘C’, we will match each character one by one until we get the first mismatch. Then we will attach the remaining word from that node.
If the trie contained the word “Crack”, then we will traverse till node ‘r’ and then attach “icket” to the subtree of ‘r’.
Finally, when the entire word is inserted we attach the meaning of the word in
the node representing the last character.
- Search: Suppose we want to search the word “null” in the above trie-based dictionary. First, we will check whether the start node contains the first character ‘n’ as a child node. If not, it means that the word ‘null’ is not present in the trie.
If yes, we will move to the child node of ‘n’ and search the remaining word recursively. We will check whether ‘u’ is a child of ‘n’ and ‘l’ is a sub child of ‘u’ and so on. Finally, when we reach the last character of the word, we will check if it is marked as terminal or not. If yes, then we return the meaning of the word, else the word does not exist in the trie.
Let us now see the dictionary implementation using tries.
Implementation of Dictionaries using Tries
#include <bits/stdc++.h> // Data Members of TrieNode class. // Initializing the data members. // Reached the end of word. // Attaching meaning of word.
// Checking if character exists or not
// Word found. So return meaning. } // Word not found. So return an empty string. // Recursive call. |
Output:
Meaning of cricket: A game introduced by Britishers Meaning of and: It is a conjunction |
Time Complexity
The time complexity of all the three functions insert() and search() is O(N), where ‘N’ is the length of the word. It is because in every case we are traversing through the entire word.
Space Complexity
O(26 * N), where N is the number of words.
Other Applications of Tries
Here are some other interesting applications of tries apart from building a dictionary.
- AutoComplete: It is one of the most important applications of tries. It is a mechanism in which the software guesses the remaining part of the word the user is typing based on the string prefix.
- Spell Checker/ Autocorrect: Spell checking or Autocorrect is a three-step process.
- Check for the word in the trie.
- Generate potential suggestions
- Sort the suggestions with the highest priority first.
3) Longest Prefix Matching: Longest prefix matching is an algorithm used by routing devices in IP networking.
Practice Problems:
Would you mind moving to our coding platform, CodeStudio, and practice some coding problems on tries. After implementing them you will be a Ninja in tries.
Key Takeaways:
In this blog, we learned about a new data structure called Tries. And how to implement dictionaries using tries. We can implement dictionaries very efficiently using tries. We learned how to insert, search and delete a word in a trie and learned how to implement it in C++. Next, we saw some of its exciting applications. Practising these problems would increase your proficiency in data structures.
Thank you and Happy Coding.
By: Husen Kagdi
Comments
No comments yet
Be the first to share what you think