Delete Duplicate Rows in Table without Primary Key

In the previous article, I showed a way how to delete duplicate rows when Primary Key exists in the table. Now I want to demonstrate how to do the same thing but in case the table does not contain a primary key.
I prepared an MS SQL statement for those who want quickly copy and paste sample and use it on their own database:

DELETE deleteSub FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY FirstName, LastName) RowId, FirstName, LastName
 FROM [dbo].[YarkulDuplicatePersons]) deleteSub
WHERE deleteSub.RowId > 1

And for those who want to understand in more detail, please study the following example.

Step 1 – Create a Test Table

Here is the SQL for our test table. Note that this table does not have a primary key. It is a bad practice to construct such kind of table but anyway let’s do it.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[YarkulDuplicatePersons]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[YarkulDuplicatePersons](
    [FirstName] [nvarchar](50) NOT NULL,
    [LastName] [nvarchar](50) NOT NULL,
    [DateOfBirth] [datetime2](7) NOT NULL
) ON [PRIMARY]
END
GO

 

Step 2 – Insert Test Data

INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Krista', 'Montes', '1992-06-18')
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Krista', 'Montes', '1992-06-18')
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Reyes', '1998-05-03')
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Reyes', '1998-05-03')
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Reyes', '1998-05-03')
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Volan', '1993-04-09')
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Marvin', 'Lynn', '2001-07-08')

You can see that ‘Krista Montes’ was inserted 2 times and ‘Christian Reyes’ 3 times. Also, I specially added ‘Christian Volan’ to highlight that we want to remove duplication by the combination of the ‘FirstName’ and ‘LastName’ columns.

check duplication in table

Step 3 – Prepare Data for Deletion Using ROW_NUMBER and PARTITION BY

 

SELECT ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY FirstName, LastName) RowId, 
        FirstName, 
        LastName
FROM [dbo].[YarkulDuplicatePersons]

view partitioned data for deletion

I think it is important to understand how ROW_NUMBER and PARTITION BY work with our data. The 'PARTITION BY FirstName, LastName' divided our data into blocks where the same FirstName and LastName. And ROW_NUMBER() numbered each data block.

Step 3 – Delete Duplicate Rows Except One

Now if you wrap this statement into a SUB query you can apply the ‘DELETE’ command with ‘WHERE RowId > 1’

DELETE deleteSub FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY FirstName, LastName) RowId, FirstName, LastName
 FROM [dbo].[YarkulDuplicatePersons]) deleteSub
WHERE deleteSub.RowId > 1

check table after duplication removed

My Video Demonstration:

Buy Me A Coffee

Leave a Comment