Remove Full Duplicates by Using row_number() Function and Common Table Expression

This blog article is inspired by another very informative one which explains really well the meaning of the new row_number() function introduced since SQL2005. Since the original article did not give a full answer to how to delete multiple fully duplicated rows in a table, I am putting my code here for the test.

The good thing about row_number() function is that it can be based on partition of columns. New row number count starts for each partition of full duplicates and those rows with row number bigger than 1 can be easily spotted and removed. Here is the example that explains everything I am talking about here.

First of all, create a table in a test db

[sourcecode language=”sql”]
USE [test_db]
GO
CREATE TABLE [dbo].[Employee](
[EMPID] [int] NULL,
[FNAME] [varchar](50) NULL,
[LNAME] [varchar](50) NULL
) ON [PRIMARY]
[/sourcecode]

Then let’s insert many duplicated rows:

[sourcecode language=”sql”]
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, ‘MICHAEL’, ‘POLAND’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, ‘MICHAEL’, ‘POLAND’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, ‘JIM’, ‘KENNEDY’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, ‘JIM’, ‘KENNEDY’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, ‘JIM’, ‘KENNEDY’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, ‘JAMES’, ‘SMITH’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, ‘JAMES’, ‘SMITH’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2011111, ‘ADAM’, ‘ACKERMAN’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (1021710, ‘MARIAH’, ‘MANDEZ’)
GO
[/sourcecode]

If you selete everything from this table now, the following should be there:

EMPID FNAME LNAME
1021710 MARIAH MANDEZ
2011111 ADAM ACHERMAN
2021110 MICHAEL POLAND
2021110 MICHAEL POLAND
2021115 JIM KENNEDY
2021115 JIM KENNEDY
2021115 JIM KENNEDY
2121000 JAMES SMITH
2121000 JAMES SMITH
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER
3015670 MARTHA LEDERER

And here comes the de-dup script with the help of common table expression:

[sourcecode language=”sql”]
WITH temp_employee AS
(SELECT ROW_NUMBER() OVER (PARTITION BY EMPID, FNAME, LNAME ORDER BY EMPID) AS rowid, * FROM EMPLOYEE)
DELETE temp_employee WHERE rowid > 1
[/sourcecode]

Hope this article is illustrative enough.


Posted

in

,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *