Introduction to File Organization

Introduction  

Before we start this, do you know what a File is?

A File is a collection of records, or we can say it is a sequence of records stored in a binary format, and with the help of a primary key, we can access these records.

Pieces of information and Relative data are stored in file formats collectively. A disk drive is formatted into several blocks and stores various records. File records are mapped onto those disk blocks.

Now, What is File Organization?

File organization is a logical relationship among various records. It defines how file records are mapped onto disk blocks.

One of the approaches to map the database to the file is to use the various files and store only one fixed length record in any of the given files. An alternative approach will be structuring our files to collect or contain multiple records lengths.

Objectives of file organization 

We'll learn about the objectives we want to achieve using file organization.

The following are the objectives of file organization:

  • Optimal Selection of Records: As the name suggests, the selection of records should be optimal, which means the selection of records should be as fast as possible.
  • Easy Transaction: Operations like inserting, deleting, or updating the record's transactions should be easy and quick.
  • No duplicate records: No duplicate record should be induced due to insert, update or delete.
  • Efficient Storing: Records should be stored efficiently to minimize the cost of storage. Empty blocks of files between the records should be utilized.

Now, we will learn about types of file organization.

Types of file organization

There are various methods in file organization. These methods have their pros and cons accordingly. These methods may be efficient for certain types of selection. Meanwhile, they will be inefficient for other selection.

So, the developer or the programmer decides the best-suited file organization method depending on his requirement according to the situation.

Some of the file organizations are as follows:

  • Sequential File Organization
  • Indexed Sequential Access Method(ISAM)
  • Heap File Organization
  • Hash File Organization
  • B+ File Organization
  • Cluster File Organization


Sequential file organization:

Each file records contain a data attribute that uniquely identifies the record. Records are placed in the file in sequential order based on the unique key field or search key. But, practically, it is impossible to store all the records sequentially in physical form. Some of the pros of sequential file organization:

  • Fast and efficient for the huge amount of data.
  • Simple in design.
  • Files can be easily stored in magnetic tapes (which means it is a cheaper storage mechanism).


Everything has pros and cons. The primary cons of sequential file organization are a wastage of time because we can not jump on a particular record as we have to move sequentially.

Heap File Organization:

When a file is created using this file organization, the OS(operating system) allocates memory to that file without any accounting details. Here file records can be placed anywhere in that memory area. Heap file organization does not support any orders, indexing, or sequencing on its own.

Pros: 

  • This method is helpful for bulk insertion when a huge number of data needs to be loaded into the database at the same time.


The primary cons of this file organization are that it is memory inefficient as there is unused memory in this method.

Hash File Organization:

This file organization uses a hash function on some of the fields of the records. The hash function output gives the location of blocks on the disk where the records need to be placed.

Pros of using hash file organization are:

  • In this, records do not need to be sorted sequentially after every transaction; hence it becomes more efficient(since the effort of sorting is reduced).
  • The address of the block is known by the hash function, which makes it significantly faster to access or search the record in the memory.
  • Since accessing the record is quick, deleting and updating will be very quick.


The major disadvantage is that it is memory inefficient.

Since all the records are randomly stored in the memory(as the data in random blocks whose addresses are given by hash function), records are scattered in the memory. Hence memory is not efficiently used here.

B+ File Organization:

It is an advanced method of indexed sequential access method. It uses a tree-like structure to store the records in files. The B+ tree is similar to BST(Binary Search Tree), but it can have more than two children.

Pros:

  • Searching is very efficient as all the records are stored only in leaf nodes, in a sequential linked list (in a sorted manner).
  • Traversing the tree is easier and faster.

Cons:

This method is inefficient for static tables.

Clustered File Organization:

Clustered file organization is not recommended for large databases. In this, related data from one or more relations are stored in the same disk block, which means that records are not based on search or primary key.

Frequently Asked Questions

  1. What are the pros and cons of heap file organization?
    Pros of heap file organization are:
    1. Fetching the records and retrieving them is faster than sequential records in the case of a small database only.
    2. When a massive number of data needs to be loaded into the database at a time, then this method is suitable for this purpose.

    Cons of heap file organization are:
    1. Unused blocks of memory are one of the major issues here.
    2. Also, it is not efficient for more extensive databases.
     
  2. What are file operations?
    File operations are categorized into two types that are:
    1. Update operations change the data values by insertion, updates, and deletion.
    2. Retrieval operations - this operation does not alter the data. It just retrieves them after filtering.
     
  3. What are the advantages and disadvantages of hash file organizations?
    The following are the advantages of the hash file organization:
    1. Records do not need to be sorted sequentially after every transaction. Hence it becomes more efficient(since the effort of sorting is reduced)
    2. The address of the block is known by the hash function, which makes it significantly faster to access or search the record in the memory.
    3. Since accessing the record is quick, deleting and updating will be very quick.
    4. This method is suitable for online transactions systems like online banking, ticket booking systems, etc.

    Disadvantages are:
    1. Since all the records are randomly stored in the memory(as the data in random blocks whose addresses are given by hash function), records are scattered in the memory. Hence memory is not efficiently used here.
    2. This method is not suitable for searching data with the given range of data as the records are randomly stored. Hence the range search will not give the correct output.

Key Takeaways

In this blog, we start by introducing files. Then we learned about file organization and the objectives of file organization. We also learned different types of file organizations like Sequential, Heap, Hash, B+, Cluster file organization. We learn about some of these file organizations.

Visit here for the top 100 SQL problems asked in various product and service-based companies like Google, Microsoft, Infosys, IBM, etc.

Click here to learn more about different topics related to database management systems.

Also, try CodeStudio to practice programming problems for your complete interview preparation.

Was this article helpful ?
2 upvotes

Comments

No comments yet

Be the first to share what you think