Sometimes, tables in databases contain duplicate records. To get rid of duplicate records in databases, we first need to identify them and remove them from the given table. It will also be great to prevent inserting duplicate records into our table. We will use different methods to deal with duplicates.
To find duplicates in a table, we need to use two following clauses.
- The GROUP BY clause is used to group all rows of the target column. The target column is the column that we want to check for duplicate values.
- The COUNT function in the HAVING clause checks if any groups have more than one entry. If they have, it means those have duplicate values.
SELECT column1, coulmn2, … COUNT(*)
GROUP BY column1,coulmn2, ….
HAVING COUNT(*) > 1;
A few duplicates are present in the OrderID column in the given example. In an ideal situation, each row should have a unique value for OrderID since each order is assigned its value. But this thing is not implemented here. We can use the given query to find duplicates.
SELECT OrderID, COUNT(OrderID)
GROUP BY OrderID
The above SQL code prints all entries whose duplicates are present in the table.
Number of Records: 2
The UNIQUE constraint is used so that a column does not contain duplicates.
Both the PRIMARY KEYand UNIQUE constraints guarantee the uniqueness of a column or set of columns.
A PRIMARY KEY constraint intuitively has a UNIQUE constraint.
However, we can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per table.
CREATE TABLE table_name(
col1 datatype [NULL | NOT NULL],
col2 datatype [NULL | NOT NULL],
col2 datatype [NULL | NOT NULL],
CONSTRAINT constraint_name UNIQUE(uc_col1, uc_col2,..uc_col_n)
- Table_name is name of the table that we are going to create.
- col1, col2 are the columns we wish to create in the table.
- Constraint_name is name of the unique constraint.
- uc_col1, uc_col2, ... uc_col_n are the columns that make up the unique constraint.
In the example, we have created a unique constraint on the employees table named employees_unique. It has one field, which is the employee_number.
CREATE TABLE employees
( employee_id INT PRIMARY KEY,
employee_number INT NOT NULL,
last_name VARCHAR(50) NOT NULL,
CONSTRAINT employees_unique UNIQUE (employee_number)
The distinct clause is used to eliminate duplicate records present inside a table.
The DISTINCT keyword will fetch unique records.
SELECT DISTINCT expressions
In this example, we used a table called suppliers having following data:
We find all the unique states in the suppliers table using the SQL command below.
SELECT DISTINCT state
ORDER BY state ;
This SQL command will return all unique state values from the suppliers table and eliminate duplicates. As we can see, the state of California only appears once in the result set instead of four times.
Frequently Asked Questions
Q1. Why is the DISTINCT keyword used?
The DISTINCT keyword is used to fetch distinct records from a database table.
It is used to remove duplicates from the output of the SELECT statement.
Q2.What is the Difference Between UNIQUE and DISTINCT in SQL?
Distinct keyword in SQL is used with SELECT Statement to remove duplicates from the result of SELECT Statement. But The UNIQUE keyword in SQL is used to mark one of its database constraints. The UNIQUE constraint on a column can prevent duplicate values from being stored in this column or set.
Q3. Can we use the DISTINCT keyword with all aggregate functions in SQL?
With SUM(), AVG(), and COUNT(expression), DISTINCT eliminates duplicate values before the sum, average, or count is calculated. It is meaningless in the case of MAX() and MIN().
Duplicates in tables cause redundancy in databases. It requires extra memory to store these duplicate records. We first identify and then remove these duplicates. We used GROUP BY and COUNT clauses to identify duplicates and the DISTINCT clause to remove duplicates. It is also better to prevent these duplicate records from inserting into the database. We use the UNIQUE clause to prevent duplicates.
Visit here to learn more about different topics related to database and management systems. Ninjas, don’t stop here. Check out the Top 100 SQL Problems to master frequently asked questions in big companies and land your dream job. Also, try CodeStudio to practice a wide range of DSA questions asked in many interviews.