Marie Kondo, a well-known author, popularized the idea that getting organized is the most important thing you can do for yourself daily. It’s more crucial to manage your company or catalog data than it is to find that pair of socks in your drawer every morning.
But are there any specific guidelines to follow to accomplish this?
The function of a database is self-evident. It’s a mechanism for organising product and corporate data. Database management systems(DBMS) are administration systems that manage databases containing data by accepting queries from users using languages like standard query language (SQL).
Normalization in SQL refers to the process of structuring a database’s columns(attributes) and tables(relations) so that database integrity constraints correctly enforce their functional dependencies. It’s done by forming a new database design or enhancing an existing database design that involves applying formal principles.
How to Learn SQL in 2021? Step-By-Step Guide Prepared For You.
What is Database Normalization?
Database normalization is reorganising a relational database into a normal form to eliminate data redundancy and improve data integrity.
Database normalization in SQL is a procedure that is followed for each database that you create. Normal Forms is the process of taking a database architecture and applying a set of formal criteria and rules to it.
The process of database normalization in SQL has the following types:
- First Normal Form (1 NF)
- Second Normal Form (2 NF)
- Third Normal Form (3 NF)
- Boyce Codd Normal Form or Fourth Normal Form ( BCNF or 4 NF)
- Fifth Normal Form (5 NF)
- Sixth Normal Form (6 NF)
Note: We have not discussed the fifth and sixth normal form in this article.
Why is Normalization Required?
Database normalization in sql is to streamline data by removing unnecessary information thus reducing data redundancy. Data redundancy refers to several copies of the same information in the same database, dispersed across several places.
The following are some of the disadvantages of data redundancy:
- Redundant data expands the size of a database and consumes an excessive amount of disc space.
- It creates data inconsistency.
- Anomalies in the insert, update and delete become more common. For example, an update anomaly means that all copies of the same record, duplicated across the database, will need to be updated to maintain the record consistent.
Types of Normalization
First Normal Form (1NF)
A table must satisfy the following conditions to be in the first normal form:
- The order of rows and columns does not matter.
- It cannot have multivalued attributes.
- The name of a row and column is always unique.
- Attributes should have the same domain.
Let’s take a look at an example.
The table given below shows that subjects have more than one value, so it is not in the first normal form.
So, we have decomposed it into the 1nf using normalisation by creating a separate tuple for those attributes in the table given below.
Second Normal Form (2NF)
A primary key is a unique combination of attributes (columns) that uniquely identify a tuple (row) in a relation (table). If all of an entity’s attributes are dependent on the entire primary key, it is said to be in a second normal form.
- The table must be in 1NF.
- There should be no partial dependencies in the table. When a non-prime attribute is functionally dependent on a portion of a candidate key, this is known as partial dependency.
- Hence, all non-prime attributes should be fully functionally dependent on the primary key.
Let’s take an example.
Since all the attributes consist of a single value, the table given below is in first normal form(1NF). However, it is still not in 2NF as here primary keys are Customer ID and Store ID combined, and Location is the non-prime attribute which is only dependent on Store ID. Hence it shows partial dependency.
|CUSTOMER ID||STORE ID||LOCATION|
Now, to achieve 2nf, we always have to divide the table to remove partial dependencies in this way:
Both the tables are now in 2nf, having no partial dependencies as both the tables have their separate primary key.
Third Normal Form (3NF)
According to the third normal form, fields in a table that are not dependent on the key should be removed.
- The table(relations) should be in 2NF and the non-prime attribute should not rely on other non-prime attributes i.e. non-key attributes of a table must be functionally reliant on a candidate key.
- There should be no transitive functional dependency. Changing a non-key attribute, transitive functional dependency might cause any of the other non-key attributes to change.
Example: The table given below is in 1NF and 2NF. However, it is not in 3NF because the city which is a non-prime attribute is determined by another non-prime attribute state.
Functional dependencies given are:
Pincode -> State
So, here occurs transitive dependency as the City transitively depends on Pincode.
Now, to achieve 3nf we always have to divide the table.
Both the tables are now in 3nf having no transitive dependencies.
Boyce Codd Normal Form ( BCNF)
Even when a database is in the 3rd Normal Form, still there would be some anomalies if it has more than one Candidate Key. To resolve this issue we have BCNF.
- BCNF is a modification of the third normal form in which the limitation of a non-key attribute is removed.
- A table is in BCNF, if every functional dependency X → Y, X is the table’s super key.
- It has two or more candidate keys in the table. At least two of the possible candidate keys include multiple attributes.
- The table should be in 3NF for BCNF, and LHS is super essential for every functional dependency.
Now, in the table given below Emp_ID and Department together work as a primary key. So, using these we will be able to find other tuples.
Functional dependencies given are:
Emp_ID, Department -> Dept_Location
So, here Dept_Location is a non_prime attribute determining Emp_ID.
So, the table is not in BCNF. To convert it into BCNF, we have decomposed the table into two separate tables given below.
Although a fifth and sixth form exists, most databases do not aspire to use those levels because they take extra work and don’t truly impact the database functionality and improve performance.
You can also take a deep dive into SQL and data management with our advanced coding course on Data Science and Machine Learning.
Frequently Asked Questions
The process of removing redundancy from a relation or set of relations is known as normalisation. Using normalization in sql , it becomes easy for a user to insert, delete or update the values without harming the other data.
If a relation has an atomic value, it is in 1NF. If a relation is in 1NF and all non-key attributes fully function and depend on the primary key, it is in 2NF. If a relation is in 2NF and there is no transition dependency, it is in 3NF.
The process of structuring data in a database is known as normalisation. This includes generating tables and building relationships between them according to rules aimed to secure data while also allowing the database to be more flexible by removing redundancy and inconsistent dependencies. Normalization in sql has many types like 1NF, 2NF, 3NF and BCNF.
Normalization is the process of reorganising data into a related table. It also eliminates data redundancy and increases integrity which improves the performance of the query.
By looking at new data types in the database, normalisation can help to decrease redundancy and complexity. For example, it’s good to break up a vast database table into smaller tables and use relationships to connect them. It prevents duplicate data from being entered into a table and the presence of recurring groups.
A database must be normalised to reduce redundancy (duplicate data) and ensure that only related data is stored in each table. It also prevents any problems caused by database changes like insertions, removals, and updates.
We’ve gone through database normalisation basics in SQL and its types (1NF, 2NF, 3NF and BCNF). Knowing the fundamentals of database normalization in SQL will help you comprehend relational ideas, the requirement for numerous tables in database design structures, and how to query multiple tables in a relational environment.
By Mehak Goel