Selecting duplicates in a database table can be accomplished in various ways and all these approaches involve various trade-offs with respect to efficiency and speed but before we start examining some of these approaches we need a table with duplicates so here is one. The Primary Key in this table is the ID and the Emp_Id is meant to be Unique and will be used for filtering duplicates.
Listing the number of times each record occurs with duplicate rows listed once.
(1) select *, count(*) from employee group by Emp_Id;
Listing the number of times each record occurs with a listing of the first row in the group.
(2) select *, count(*) from employee group by Emp_Id having count(*) > 1;
To list all duplicate rows, the following query would work;
(3) select a.* from
employee as a
inner join
(select Emp_Id
from employee
group by Emp_Id
having count(*) = 2) as b
on a.Emp_Id = b.Emp_Id;
Now our table has a Tele_No column and some of these numbers are incorrect
Selecting the duplicates using a self-join.
In this case, we need to select all records that have the same Emp_Id but a different Tele_No. We can use a simple self-join with the Tele_No and the column as the distinguishing key to separate our records.
Option 1
(4) select e1.* from employee e1, employee e2 where e1.Emp_Id = e2.Emp_Id and e1.Tele_No != e2.Tele_No order by e1.Id asc;
The exact same result could also be obtained with a sub-query as follows:
Option 2
(5) select e1.* from
employee e1 where e1.Emp_Id IN (
select e2.Emp_Id from employee e2
where e1.Emp_Id = e2.Emp_Id
and e1.Tele_No != e2.Tele_No
)
order by e1.Id asc;
In Option 1 & 2, we have the condition e1.Tele_No != e2.Tele_No that prevents us from getting a cartesian product of the records. If such a condition is not available then just use the Primary Key column as follows:
(6) select e1.* from
employee e1 where e1.Emp_Id IN (
select e2.Emp_Id from employee e2
where e1.Emp_Id = e2.Emp_Id
and e1.ID <> e2.ID
)
order by e1.Id asc;
If you are wondering whether a join or a sub-query would work best, it would ofcourse depend upon your query optimizer and the size of your tables but generally Joins are the way to go.
So, we have 6 records where the Emp_Id are the same but the Tele_Nos are different. To view these records in a tabular format we could modify (6) as follows.
(7) select e1.ID, e1.Tele_No,e2.ID, e2.Tele_no
from
employee e1, employee e2
where e1.Emp_Id = e2.Emp_Id
and e1.Tele_No != e2.Tele_No
and e1.Id < e2.id
order by e1.Id asc;
No comments:
Post a Comment