If there are duplicate rows in MS SQL Server database tables, it may be necessary to remove duplicate records. Let’s look at an example where duplicates are all records that have duplicate First Name
, Last Name
, and Date of Birth
. We need to remove all duplicates and leave only the first original entry from the duplicates. It is important to note that the Persons
table contains the primary key. For cases where Primary Key is missed, you can read my article Delete Duplicate Rows in Table without Primary Key.
.
Step 1 – Define Duplicates
Using GROUP BY
and HAVING COUNT(1)> 1
we group by fields that define a duplicate record. This list shows the entries and the number of duplicates.
SELECT [FirstName] ,[LastName] ,[DateOfBirth] ,COUNT(1) DuplicateCount FROM [YarkulTestDb1].[dbo].[Person] GROUP BY FirstName, LastName, DateOfBirth HAVING COUNT(1) > 1
Step 2 – Define ID’s of first Unique Row
In order to remove duplicates, we need to understand which records should be left. To do this, select the IDs that will be ignored when deleting. The MIN
and GROUP BY
functions allows us to select the entries that were created the earliest. This approach only works if you have a primary key on the table.
SELECT MIN(Id) FROM [YarkulTestDb1].[dbo].[Person] GROUP BY FirstName, LastName, DateOfBirth
Step 3 Delete Duplicates
And now let’s delete rows which not in the statement of the previous SQL result.
DELETE FROM [YarkulTestDb1].[dbo].[Person] WHERE Id NOT IN( SELECT MIN(Id) FROM [YarkulTestDb1].[dbo].[Person] GROUP BY FirstName, LastName, DateOfBirth )
Step 3 – Check Results
select * FROM [YarkulTestDb1].[dbo].[Person]
Before deleting, I recommend making a backup of the data. In this case, the easiest way is to create a copy table.
Check my video: