Monday, December 5, 2011

Delete Duplicate data

Hey there everyone, its been awhile since i have blogged hope everyone has been doing good.
We are here to look into problem related to duplicate records, suppose we have a table like

CREATE TABLE TestTable(COL1 INT, COL2 INT, COL3 VARCHAR(50))

INSERT INTO TestTable VALUES (1, 1, 'ONE')
INSERT INTO TestTable VALUES (1, 2, 'TWO')
INSERT INTO TestTable VALUES (1, 2, 'TWO')
INSERT INTO TestTable VALUES (1, 3, 'THREE')
INSERT INTO TestTable VALUES (1, 3, 'THREE')
INSERT INTO TestTable VALUES (1, 3, 'THREE')

SELECT * FROM TestTable



As we can see there are duplicate records, now what if we need to delete all duplicate records.
Here is an easy way to get it done, Create a CTE(Common Table Extension) of ROW_NUMBER using PARTITION BY from TestTable and delete all records from it with Row_Number > 1.
Have a look in code snippet

;WITH TestCTE AS(
SELECT ROW_NUMBER()OVER( PARTITION BY COL1, COL2, COL3 ORDER BY(SELECT '')) AS RowNumber
FROM TestTable)

DELETE TestCTE WHERE RowNumber > 1


Now you will see records have been deleted from original table as well, This is what we were looking for :)



Another Way of Delete Duplicate records



Create Table With Autogenerate ID Such As FirstName ,ID Columns .
This Table Contain Duplicate Records.

SELECT FirstName FROM tblTest GROUP BY FirstName HAVING COUNT(*) > 1

Then Delete

DELETE FROM tblTest WHERE ID NOT IN (SELECT MAX(ID) FROM tblTest GROUP BY [FirstName])

1 comment: