Introduction Normalization in Databases
Let us assume that you have an online store and you are storing the names of your customers, the product they bought. You also have their eMail IDs for future email marketing. Let’s consider that a person named John buys a Scissor from your online store and you have his details stored in a dataset. If he makes another purchase, let us say a knife, his data would be stored twice. So when you use that particular dataset for your future eMail Marketing campaign, there will be two same eMails to John.
Would you want your loyal customers to get bombarded with emails? If someone buys 100 items from your store, are they supposed to receive 100 emails from you? Definitely not. This is why the term Normalisation came into existence in 1970 when Edgar F.Codd, a computer scientist proposed this as a part of his model.
What is Normalization in Databases?
Normalization in a Database is the process by which a big dataset is broken down further into smaller datasets in order to mitigate or eliminate redundancy and inconsistent dependence. This is done by making tables and setting up relationships amongst them according to guidelines. Carrying out normalization in a dataset offers data protection as well.
Types of Normalization in DBMS
There are fundamentally four types of Normal forms. They are 1nF, 2nF, 3nF, BCNF. Below are step by step normalization examples so that you have a better vision of Normalization in DBMS.
- 1nF Normalization: This type of normalization is used to prevent the problem of storing multiple values in a single cell/tuple. If a table contains multiple values in a single cell then it violates the first normal form(1nF). The 1nF Normalization removes all the repeated values in the table. It allows us to create separate tables for each of the related datasets. And lets us identify every related dataset with a primary key. A ‘primary key’ is like a unique ID of an employee working in a conglomerate. Each unique ID will be a distinct value and there will be no repetition of this value.
- 2nF Normalization: This type of Normalization has all the characteristics of the 1nF normalization and there should be no partial dependency in the table. This means that all the non-primary key attributes in the table should be completely dependent on the primary key in functionality. Let us take a company’s employee dataset. There are three columns, Employee ID, department ID and location of the office. The location of the office is not dependent on the Department ID but the Employee ID. This is a 1nF Normalization. To make this into a 2nF form, we must split the table into two. One with Employee ID and Department ID, and the other one with Department ID and location of the office. Now both Employee ID and Department ID become primary keys in their respective tables with Office Location being the non-primary attribute. This Office location column is completely dependent on functionality with both Employee ID and Department ID, making the table into a 2nF normal form. This is called 2nF Normalization.
- 3nF Normalization: This type of Normalization will have all the characteristics of the 2nF normalization and transitive dependency is prohibited for all non-prime attributes.
Let us take an example of a table containing Student ID, Student name, Subject ID and Subject Name. So the name of the subject is dependent upon the Student ID and Subject ID as well. A more simplified version of the previous statement is that the Subject is dependent on the Student ID via the Subject ID. This is called a transitive dependency. To remove this transitive dependency, we can break or divide the table into two. One containing Student ID, Student name and Subject ID, and the other one containing Subject ID and Subject name. This table will then be 3nF normalized.
- BCNF: BCNF stands for Boyce-Codd Normal Form. This type of normalization will have all the characteristics of the 3nF Normalization. To better understand BCNF, let us look at normalization in the database with example tables.
|Student ID||Subject||Professor Name|
We see that this table satisfies the 3nF form. Now we see that one student can take up more than one subject. There can also be multiple teachers teaching one subject and for each subject, a teacher is assigned to a student.
This table above does not satisfy BCNF. This is because the Student ID and subject form the primary key. This means that the subject column is the prime attribute. But the Professor depends on the subject. Subject is a prime attribute while Professor Name is not, so this is not allowed by BCNF. To realise this in BCNF we will divide this table into two. We will add a column called ‘Professor ID’ to each of the two tables. Now one table will have a Student ID and Professor ID whilst the other will have Professor ID, Subject and Professor Name. Now this will satisfy the BCNF.
To get in-depth learning of Normalization and other DBMS fundamentals, visit
https://www.codingninjas.com/courses/dbms-course for more information. You can also check out a good normalization in DBMS pdf such as this or the database normalization cheat sheet for step by step normalization examples.
Frequently Asked Questions
The four types of Normalization in DBMS are 1nF, 2nF, 3nF, BCNF.
Normalization is used in databases to reduce redundancy and improve overall data integrity.
These are normalization forms. 1nF is the first normal form. 2nF is the second normal form. 3nF is the third normal form.
The first step to realise 1nF is by removing recurring values, the second step is realising 2nF by removing partial dependency. The third step is carried out by realising 3nF. This is done by removing transitive dependency.
Yes, as the second normal form is more powerful than the first normal form.
A normalization example in SQL would be ensuring that the values are properly organised in columns and rows in order to maintain dependencies and to facilitate data integrity.
Yes, there are other alternatives such as denormalized schemas and flat-star schemas. However, normalization is the best course of action for most databases.
Transactional systems should always follow the third normal form or 3NF.
Yes, over-normalization such as segregating data with additional ids that are not required can make the database more complex and harder to maintain.
Normalization in DBMS is crucial for maintaining an organised database that can be used effectively for business requirements and daily operations. Without normalization, companies can misplace data, suffer from duplication errors and face massive consequences due to using unstructured data in software suites for businesses. This is why learning the different forms of normalization is important and this is why it is of utmost importance to use the most suitable normal form for the purpose of normalizing data.