Hash File Organization

PRASHANT SINGH
Last Updated: May 13, 2022

Introduction  

In DBMS(Database management system), to retrieve a particular data, it is not recommended to search through all the data to get the desired data, and this method is inefficient also. Thus, in this situation, the concept of hashing comes into the picture. Hashing is an efficient method to search the location of the desired data on the memory disk without searching through all the data using an indexed structure. In this, data is stored in the memory blocks, and the hash function generates the addresses of these blocks.

What is hash file organization?

Now coming to the Hash file organization, a hash function is used to calculate the addresses of the memory blocks used to store the records. The hash function can be any mathematical function that is either simple or complex. The hash function is applied to some attributes or columns (key or non-key columns) to get the address of the block. That's why it is also known as Direct file organization.

If the hash function is generated using a key column, that column is known as the hash key.

If the hash function is generated using a non-key column, it is known as a hash column. 

In this file organization, we don't need to perform operations like searching or sorting the entire file to get a particular record. Hash file organizations work on the principle of hashing.

Hashing is further categorized into two parts:

  1. Static Hashing 
  2. Dynamic Hashing


Let us learn these hashing in detail using their suitable examples.

Static Hashing

In static hashing, the resultant address of the data bucket will always be the same. You will learn about static hashing clearly by the following example.

Example:

If we generate an address for USER_ID= 13 using the simple hash function mod(5), it will always result in the same bucket address 3. There will be no change in the resultant bucket address in this example.

This proves that the number of data buckets in the memory remains constant throughout the process in static hashing.

Some of the operations of static hashing are given below:

  • Search a record

To search a particular record, the hash function retrieves the memory address of the bucket where the desired data is stored (by the same hash function).

  • Insert a record

To insert a new record into the table, the hash function generates an address for the new record based on the hash key, and the record is stored in that location.

  • Update a record

To update a record, it needs to be searched in the database. The hash function is used to search the record, then update it.

  • Delete a record

To delete a record from the table, we will first fetch the record using searching(by hash function), then we will delete the records for that address in the memory.

Now you must be thinking, if we want to insert a record and the hash function gives the address of the block already filled with some record, what will we do!

When we want to insert some new record into the table, but the address of a data bucket generated by the hash function is not empty(means already filled by some other data), this situation in static hashing is known as Bucket Overflow.

Dynamic Hashing

Since, in static hashing, the data buckets do not expand or shrink dynamically as the size of the database increases or decreases. This is the major drawback of static hashing, and that's why the concept of dynamic hashing comes under the picture.

In Dynamic hashing, the data buckets expand or shrink ( data buckets added or removed) dynamically according to the increase or decrease in records. It is also known as Extended hashing.

Example:

In this, hash functions are made to generate a large number of values. For example, three records A1, A2, and A3 need to be stored in the table. The hash function produces three addresses for each 1001, 0101, and 1010 respectively. If we try to load the data according to the first bit, it tries to load three at addresses 1 and 0.

But the problem is there is no remaining bucket address for A3 as one is already filled by A1.

Thus, the bucket has to expand dynamically to store the third record A3, so it changes the address will use two bits instead of one bit; then it tries to accommodate A3.

The figure below shows this example perfectly:

Frequently Asked Questions

  1. What are the advantages of hash file organizations?
    The following are the advantages of the hash file organization:
    1) In this, records do not need to be sorted sequentially after every transaction; hence it becomes more efficient(since the effort of sorting is reduced).
    2) The address of the block is known by the hash function, which makes it significantly faster to access or search the record in the memory.
    3) Since accessing the record is quick, deleting and updating will be very quick.
    4) This method is suitable for online transactions systems like online banking, ticket booking systems, etc.
     
  2. What are the disadvantages of hash file organization?
    Everything has its pros and cons, the major drawbacks of hash file organizations are as follows:
    1) Since all the records are randomly stored in the memory(as the data in random blocks whose addresses are given by hash function), records are scattered in the memory. Hence memory is not efficiently used here.
    2) For searching data with the given range of data, this method is not suitable as the records are randomly stored; hence the range search will not give the correct output.
     
  3. What are file operations?
    File operations are categorized into two types that are:
    1) Update operations change the data values by insertion, updations, and deletion.
    2) Retrieval operations - this operation does not alter the data; it just retrieves them after filtering.

Key Takeaways

In this blog, we start by introducing the concept of hashing. Later on, we learn about the hash file organization and how it works on the principle of hashing, then we learn about the types of hashing: Static and Dynamic with their suitable examples each.

You can visit here to learn more about different topics related to database management systems.

Also, try CodeStudio to practice programming problems for your complete interview preparation. Don't stop here, Ninja; check out the Top 100 SQL Problems to get hands-on experience with frequently asked interview questions and land your dream job.

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think