Introduction to Hashing
Introduction
Hashing is one of the elementary concepts in computer programming and is heavily used by developers for many use cases. Hashing is used to index and retrieve information from the database quickly.
Hashing is the process of mapping a key to a value using hash functions. Hash functions are mathematical functions that help us hash key-value pairs.
Consider a hash function: Y = K % N, where ‘K’ is a key, ‘N’ is the size of the hash table, and ‘Y’ is the generated value.
Let us suppose the size of the hash table is 10. In that case, the values will be generated in the following ways.
(Also see, Index Mapping)
Key | Function | Value |
17 | 17 % 10 = 7 | 7 |
23 | 23 % 10 = 3 | 3 |
34 | 34 % 10 = 4 | 4 |
51 | 51 % 10 = 1 | 1 |
67 | 67 % 10 = 7 | 7 |
Here is a catch. The hash value for both 17 and 67 is 7. Should they be mapped to the same value? Definitely not. This situation, where one or more keys map to the same value, is called a collision. There are several methods to combat such problems. Let us briefly discuss them here.
Collision Handling Techniques
There are several collision handling techniques. We will discuss a few of them here.
Linear Probing
Whenever a collision occurs, we try to find a new empty location for the key down the hash table. Whenever we find an empty location, we store the key on that location.
Notice that there is a collision between 17 and 67. At first, 17 is hashed using 17 % 5 to generate the value as 2. Now, for the key 67, value = 67 % 5, which is again 2. Hence, hashing of 17 and 67 results in a collision. Here we will use the concept of linear probing. We will search for the first empty location down the hash table. The first empty location after 2 is 3. Therefore, we insert 67 at location 3.
Now comes 39, which is hashed to 39 % 5 = 4. Since location 4 is empty, we insert 39 at location 4.
Then for key 43, there is another collision. Since 43 % 5 = 3, it should be inserted at location 3, we can see that location 3 is already occupied by key 67. Therefore, we will now search for the first empty location in the hash table. So, we found location 0 as an empty location. Thus, 43 is inserted at location 0.
In the end, the key 51 is hashed to 51 % 5 = 1. Since location 1 is empty, 51 is inserted there.
Now when we want to look up the table for a key, we just have to find a hash of the key.
For example, if we want to search 17 in the above hash table, we calculate its hash. So, value = 17 % 5 = 2. Therefore, we look up at location 2. If we don’t find it at that particular location, we need to look down the hash table until we find it. If the entire table is looked up and the key is not found, it means the value does not exist in the hashtable.
Chaining
Chaining is a simple technique to remove collisions. In this technique, we implement the hash table as an array of linked lists. We find the hash of a key, as usual, using the hash function. If a collision occurs, we attach the given key at the end of the linked list in that location.
In this example, there are three collisions. Firstly, 17 is hashed to 2. Then 67 is again hashed to 2, and finally, 32 is also hashed to 2. So, as per chaining, they are linked together in a linked list.
Key | Value | Linked List Representation |
17, 67, 32 | Value = 2 | 2 -> 17 -> 67 -> 32 |
43 | Value = 3 | 3 -> 43 |
59 | Value = 4 | 4 -> 59 |
When we want to look up a key, we find its hash. Then if we want to find an element, we have to traverse the linked list.
In this case, insertion, search, and deletion take O(n) time in the worst case.
However, the amortized or the average complexity is still O(1) for insertion, deletion, and search.
Double Hashing
This collision resolution technique is quite efficient. As the name suggests, we apply two hash functions to an element. We apply the second hash only if a collision occurs after applying the first hash.
Double hashing is performed in the following way:
Hash1(key) = key % size of hash table.
Hash2(key) = prime - (prime - key % prime), where ‘prime’ is the prime number just smaller than the size of the hash table.
Value = (Hash1 + i * Hash2) % size of the hash table, where ‘i’ is initialized with one and incremented if there is a collision.
Let us consider an example to understand double hashing.
Suppose we want to insert the numbers: 17, 67, 32, 45, and 98 in the hash table. Therefore, using double hashing, we can generate the values corresponding to every key in the following way:
Key | Function | Value |
17 | Hash1 = 17 % 5 = 2. | 2 |
67 | Hash1 = 67 % 5 = 2, Hash2 = 3 - 67 % 3 = 2, Value = (2 + 1 *1) % 5 = 3. | 3 |
32 | Hash1 = 32 % 5 = 2, Hash2 = 3 - 32 % 3 = 1, Value = (2 + 1 *1) % 5 = 3, Value = (2 + 2 *1) % 5 = 4. | 4 |
45 | Hash1 = 45 % 5 =0 | 0 |
98 | Hash1 = 98 % 5 = 3, Hash2 = 3 - 98 % 3 = 1, Value = (3 + 1 * 1) % 5 = 4, Value = (3 + 2 * 1) % 5 = 0. Value = (3 + 3 * 1) % 5= 1. | 1 |
Similar to the above case, the time complexity for insertion, deletion, and search is O(N). However, the amortized time complexity is still O(1) for all three operations.
Applications of Hashing
- Hashmap: Hashing can be used to create hashmaps where we store key-value pairs. Values are unique, whereas there can be multiple keys for the same value.
- Message Digest: This is an excellent use case of hashing. The message digest is an application of cryptographic hash functions where messages can be of arbitrary length, and the hash is always of a fixed no of bits. For example, the SHA256 hashing algorithm uses 256 bits to store the hash of any message digest.
- Rabin-Karp Algorithm: This is one of the most famous applications of hashing. Rabin-Karp is a string-searching algorithm that uses hashing to find patterns in a string.
Frequently Asked Question
What is the time complexity of insertion in a hashmap?
The time complexity of the insertion of an element in a hashmap is constant which is O(1)
What is the difference between a map and an unordered_map in C++?
The data is stored in an ordered fashion in the map based on the ascending order of keys whereas the unordered_map stores the data in the fashion in that keys are inserted in.
Conclusion
In this blog, we learned the fundamentals of hashing and different types of collision-handling mechanisms. Along with that, we saw some exciting applications of hashing.
Would you mind heading over to our coding platform CodeStudio and practicing the problems to gain efficiency in hashing?
Recommended Problems:
Please head over to CodeStudio and practice some famous interview problems and gain expertise in hashing.
- Count all subarrays having sum divisible to k
- Implement Hashmap
- Make maximum number
- Find Missing Number
- Longest Substring with at most K Distinct Characters
Recommended Reading:
- Hashtable v/s STL Map
- Double Hashing
- Separate Chaining and Open Addressing for Collision Handling
- Load Factor and Rehashing
- Practice Problems
Also check out some of the Guided Paths on topics such as Data Structure and Algorithms, Competitive Programming, Operating Systems, Computer Networks, DBMS, System Design, Basics of C++, Basics of Java, Basics of Python, etc. as well as some Contests, Test Series, Interview Bundles, and some Interview Experiences curated by top Industry Experts only on CodeStudio.
Thank You and Happy Coding!