How to Remove Duplicate Rows in MS SQL Server

If there are duplicate rows in MS SQL Server database tables, it may be necessary to remove duplicate records. Let’s look at an example where duplicates are all records that have duplicate First Name, Last Name, and Date of Birth. We need to remove all duplicates and leave only the first original entry from the duplicates. It is important to note that the Persons table contains the primary key. For cases where Primary Key is missed, you can read my article Delete Duplicate Rows in Table without Primary Key.
.

Step 1 – Define Duplicates

Using GROUP BY and HAVING COUNT(1)> 1 we group by fields that define a duplicate record. This list shows the entries and the number of duplicates.
select duplicates in Person Table

SELECT [FirstName]
      ,[LastName]
      ,[DateOfBirth]
      ,COUNT(1) DuplicateCount
  FROM [YarkulTestDb1].[dbo].[Person]
  GROUP BY FirstName, LastName, DateOfBirth
  HAVING COUNT(1) > 1

 

Step 2 – Define ID’s of first Unique Row

In order to remove duplicates, we need to understand which records should be left. To do this, select the IDs that will be ignored when deleting. The MIN and GROUP BY functions allows us to select the entries that were created the earliest. This approach only works if you have a primary key on the table.
MS SQL find first unique ID in row

SELECT MIN(Id) FROM [YarkulTestDb1].[dbo].[Person]
GROUP BY FirstName, LastName, DateOfBirth

 

Step 3 Delete Duplicates

And now let’s delete rows which not in the statement of the previous SQL result.
delete duplicates from table

DELETE FROM [YarkulTestDb1].[dbo].[Person]
WHERE Id NOT IN(
SELECT MIN(Id) FROM [YarkulTestDb1].[dbo].[Person]
GROUP BY FirstName, LastName, DateOfBirth
)

 

Step 3 – Check Results

select * FROM [YarkulTestDb1].[dbo].[Person]

 
check results after delete

Before deleting, I recommend making a backup of the data. In this case, the easiest way is to create a copy table.
Check my video:

Leave a Comment