SQL: Finding Duplicate Rows based on Multiple Columns CodeUnit 25 JUN 2010

Locating duplicate rows based on multiple column values with SQL turns out to be a fairly simple exercise.

If we take the usual method for locating duplicate rows based on a single column’s value, we see that locating a duplicate is nothing more than grouping on that column where the instance of that column value appears more than once.

In other words:

SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

Now to find duplicate rows based on the values held by multiple columns, we simply expand on what we have done before and provide more parameters to work against, resulting in a SQL statement that looks like:

SELECT firstname, COUNT(firstname), surname, COUNT(surname), email, COUNT(email) 
FROM users
GROUP BY firstname, surname, email
HAVING (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND (COUNT(email) > 1)

And that is seriously how simple finding duplicate rows based on a compound key (multiple columns restraint) really is! :)

Related Posts:

About Craig Lotter

Software developer, husband and dad to two little girls. Writer behind An Exploring South African. I don't have time for myself any more.