Domain Key Normal Form
The reason for normalization is to reduce the redundancy as much as possible and ensure that anomalies are avoided to a great extent. For the same reason we have various types of normal forms viz. First normal form(1NF), second normal form(2NF), third normal form(3NF), fourth normal form(4NF), BCNF, etc.
In 1860 Ronald Fagin came up with a new kind of normal form which was called domain key normal form or DKNF. The domain key normal form was not defined in terms of traditional dependencies but in terms of domain and key. The core idea behind domain key normal form was to represent the relational schema in terms of domain constraint and key constraint. The domain key normal form ensures that the relational schema or the table is free from deletion and insertion anomalies. We shall see the same in later part of the article with the help of an example.
In domain key normal form there should be no constraints except for the key constraint and domain constraint. In simple terms the relation should not specify some “hidden message” that is not implied clearly by the key constraint or the domain constraint. Before we move ahead and discuss domain key normal form in depth, it is important to understand the terms key constraint and domain constraint.
Constraint in general means some kind of restriction. In a relational schema, a key constraint means that :
- The key value is not null
- It should uniquely identify two tuples
Consider the following students table( fig:- 01) having attributes as roll_no, name and branch. Here roll_no is the key attribute( Primary Key) because every student in a college will have a unique roll number. The key constraint dictates that the key value should not be null. Clearly in row 5, we observe that the roll_no column has been left blank. This is a violation of the key constraint. To avoid that we should have had a unique value in its place.
Consider the students table again (fig:-02). Here we can observe that there are two rows in the table with the roll_no value as 4. This is a clear violation of the key constraint, which dictates that the key should uniquely identify two tuples, in other words, the key values should not be repeated as it is unique. To avoid this violation the roll_no value in the last row should be unique.
The above two examples must have thrown light on the concept of key constraint.
The domain constraint can be thought of as a rule which specifies the type of legal values an attribute in a relation can have. To better understand this consider a voter table with attributes such as name, age, and voter_id. The domain constraint here can be that the age must be a positive integer and greater than or equal to 18. Here we specify the legal values that can be allowed in the age field. If someone tries to insert -3 or 14 as age, it will be a violation of the domain constraint and an error will be thrown.
Analysis of Domain Key Normal Form
In domain key normal there are no “general constraints.” The entire relation schema can be represented using domain constraint and key constraint. Intuitively, a relation schema is in domain key normal form if every constraint can be inferred by simply knowing the set of attributes and their underlying domain, along with the set of keys. To convert a table into domain key normal form we have to ensure there is no insertion and deletion anomalies.
Unfortunately, there is so proper rule to convert a table to domain key normal form. In practice, we never go till DKNF. However, it is of great theoretical importance and research is going on.
We will see how to detect whether the table (or relation) is in domain key normal form or not. If not, then we will see how to convert it to domain key normal form.
A table is in domain key normal form if:
- The table has domain and key constraint
- It should not have any other general constraint
Consider the following table
In the table above the attribute roll_no is the primary key. From the table, we infer that roll_no determines the marks and marks determine the remarks. Students with marks above 90 are given "excellent", students with marks greater than 70 and less than 90 are given "very good", students with marks above 40 and less than 70 are given "average", students with marks above 30 and less than or equal to 40 are given "satisfactory" and students with marks less than 30 are given "failed" as remarks. To keep things simple we are not indicating upper and lower limits of the marks.
Let's observe if the key constraint is violated or not here. We see that all the roll_no are unique and it uniquely identifies two tuples.
Similarly, we can argue that the domain constraint is also not violated considering the domains of roll_no and marks to be positive integers and that of remarks to be a set of characters.
However, can we claim that there are no general constraints here? Can we guarantee that there are no insertion and deletion anomalies here?
From the table above, we are unable to infer that students with marks less than 30 will be awarded “failed” as remarks. If none of the students failed, therefore no one will be having “failed” in their remarks column. If we try to add only marks as 30 or less in the marks column and “failed” in the remarks column without adding anything in the roll_no column(since we assumed no one failed), we will violate the key constraint since the roll_no field will be left null.
In this manner, we are able to find a general constraint that marks implies remarks. Clearly, this is an insertion anomaly.
Similarly, if we delete the tuple with roll_no = 103, we lose data. Why? Deleting roll_no 103 will also delete the values 50 and average from the marks and remarks column. Thus we won't be able to infer that a student with 50 marks has remarks as “average.” Here we showed a deletion anomaly.
The best way to keep the domain and key constraint intact and ensure no other general constraint would be to split the table into two tables.
The first table would contain (roll_no, marks) as an attribute with roll_no being the primary key.
The second table would contain ( marks, remarks) as attributes, with marks as the primary key. Here to make things easy to understand, we have denoted marks through a single column. For example, "excellent" can be anything greater than 90. While implementing this in real life, necessary changes can be made.
Both the tables are now free from insertion and deletion anomalies. We can add any number of (roll_no, marks ) pairs or any number of (marks, remarks ) pairs. Similarly, we can delete any number of (roll_no, marks ) pairs or (marks, remarks) pairs from the second table without any loss of information.
By doing so, we guarantee that there are no insertion and deletion anomalies, also the table has the key and domain constraints intact without any other general constraint. This is how we claim the tables to be in domain key normal form.
Frequently asked questions
- What is the key idea behind domain key normal form?
The key idea behind domain key normal form is to represent the relation schema in terms of key and domain constraint.
- How can we say that a database table is in domain key normal form?
There should be no insertion and deletion anomalies.
- Why is domain key normal form is hard to implement?
It is very hard to implement since it has no anomalies. To completely remove anomalies is very difficult in real life. In most cases, we restrict ourselves to 3NF and BCNF.
- What is meant by general constraints?
Any constraint other than domain constraints or key constraints is known as general constraints.
- Which is a special case of domain key normal form?
The third normal form(3NF) is considered to be a special case of DKNF since there is no transitive dependency in 3NF.
- In domain key normal form, there are no other constraints except for domain constraint and key constraint.
- Domain key normal form ensures there are no insertion and deletion anomalies.
- There should not be any general constraints except for domain and key constraints.
- If a relation is in DKNF, it is already in 5NF, 4NF, 3NF, BCNF, 2NF, 1NF.
- It is sometimes considered to be the highest form of normalization.
- Not much is known about it, and there is no set of rules for converting a table to DKNF.
Never stop learning. Explore more here!
Also, you can check out the Top 100 SQL Problems to get hands-on experience with frequently asked interview questions and land your dream job.