In the previous article, I showed a way how to delete duplicate rows when Primary Key exists in the table. Now I want to demonstrate how to do the same thing but in case the table does not contain a primary key.
I prepared an MS SQL statement for those who want quickly copy and paste sample and use it on their own database:
DELETE deleteSub FROM (SELECT ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY FirstName, LastName) RowId, FirstName, LastName FROM [dbo].[YarkulDuplicatePersons]) deleteSub WHERE deleteSub.RowId > 1
And for those who want to understand in more detail, please study the following example.
Step 1 – Create a Test Table
Here is the SQL for our test table. Note that this table does not have a primary key. It is a bad practice to construct such kind of table but anyway let’s do it.
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[YarkulDuplicatePersons]') AND type in (N'U')) BEGIN CREATE TABLE [dbo].[YarkulDuplicatePersons]( [FirstName] [nvarchar](50) NOT NULL, [LastName] [nvarchar](50) NOT NULL, [DateOfBirth] [datetime2](7) NOT NULL ) ON [PRIMARY] END GO
Step 2 – Insert Test Data
INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Krista', 'Montes', '1992-06-18') INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Krista', 'Montes', '1992-06-18') INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Reyes', '1998-05-03') INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Reyes', '1998-05-03') INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Reyes', '1998-05-03') INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Christian', 'Volan', '1993-04-09') INSERT INTO [dbo].[YarkulDuplicatePersons] ([FirstName], [LastName], [DateOfBirth]) VALUES ('Marvin', 'Lynn', '2001-07-08')
You can see that ‘Krista Montes’ was inserted 2 times and ‘Christian Reyes’ 3 times. Also, I specially added ‘Christian Volan’ to highlight that we want to remove duplication by the combination of the ‘FirstName’ and ‘LastName’ columns.
Step 3 – Prepare Data for Deletion Using ROW_NUMBER and PARTITION BY
SELECT ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY FirstName, LastName) RowId, FirstName, LastName FROM [dbo].[YarkulDuplicatePersons]
I think it is important to understand how ROW_NUMBER
and PARTITION BY
work with our data. The 'PARTITION BY FirstName, LastName'
divided our data into blocks where the same FirstName and LastName. And ROW_NUMBER()
numbered each data block.
Step 3 – Delete Duplicate Rows Except One
Now if you wrap this statement into a SUB query you can apply the ‘DELETE’ command with ‘WHERE RowId > 1’
DELETE deleteSub FROM (SELECT ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY FirstName, LastName) RowId, FirstName, LastName FROM [dbo].[YarkulDuplicatePersons]) deleteSub WHERE deleteSub.RowId > 1
My Video Demonstration: