This blog article is inspired by another very informative one which explains really well the meaning of the new row_number() function introduced since SQL2005. Since the original article did not give a full answer to how to delete multiple fully duplicated rows in a table, I am putting my code here for the test.
The good thing about row_number() function is that it can be based on partition of columns. New row number count starts for each partition of full duplicates and those rows with row number bigger than 1 can be easily spotted and removed. Here is the example that explains everything I am talking about here.
First of all, create a table in a test db
[sourcecode language=”sql”]
USE [test_db]
GO
CREATE TABLE [dbo].[Employee](
[EMPID] [int] NULL,
[FNAME] [varchar](50) NULL,
[LNAME] [varchar](50) NULL
) ON [PRIMARY]
[/sourcecode]
Then let’s insert many duplicated rows:
[sourcecode language=”sql”]
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, ‘MICHAEL’, ‘POLAND’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021110, ‘MICHAEL’, ‘POLAND’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, ‘JIM’, ‘KENNEDY’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, ‘JIM’, ‘KENNEDY’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2021115, ‘JIM’, ‘KENNEDY’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, ‘JAMES’, ‘SMITH’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2121000, ‘JAMES’, ‘SMITH’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (2011111, ‘ADAM’, ‘ACKERMAN’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (3015670, ‘MARTHA’, ‘LEDERER’)
GO
INSERT INTO EMPLOYEE (EMPID, FNAME, LNAME)
VALUES (1021710, ‘MARIAH’, ‘MANDEZ’)
GO
[/sourcecode]
If you selete everything from this table now, the following should be there:
EMPID | FNAME | LNAME |
1021710 | MARIAH | MANDEZ |
2011111 | ADAM | ACHERMAN |
2021110 | MICHAEL | POLAND |
2021110 | MICHAEL | POLAND |
2021115 | JIM | KENNEDY |
2021115 | JIM | KENNEDY |
2021115 | JIM | KENNEDY |
2121000 | JAMES | SMITH |
2121000 | JAMES | SMITH |
3015670 | MARTHA | LEDERER |
3015670 | MARTHA | LEDERER |
3015670 | MARTHA | LEDERER |
3015670 | MARTHA | LEDERER |
And here comes the de-dup script with the help of common table expression:
[sourcecode language=”sql”]
WITH temp_employee AS
(SELECT ROW_NUMBER() OVER (PARTITION BY EMPID, FNAME, LNAME ORDER BY EMPID) AS rowid, * FROM EMPLOYEE)
DELETE temp_employee WHERE rowid > 1
[/sourcecode]
Hope this article is illustrative enough.
Leave a Reply