SQL: Finding Duplicate Rows based on Multiple Columns CodeUnit 25 JUN 2010

Locating duplicate rows based on multiple column values with SQL turns out to be a fairly simple exercise.

If we take the usual method for locating duplicate rows based on a single column’s value, we see that locating a duplicate is nothing more than grouping on that column where the instance of that column value appears more than once.

In other words:

SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

Now to find duplicate rows based on the values held by multiple columns, we simply expand on what we have done before and provide more parameters to work against, resulting in a SQL statement that looks like:

SELECT firstname, COUNT(firstname), surname, COUNT(surname), email, COUNT(email) 
FROM users
GROUP BY firstname, surname, email
HAVING (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND (COUNT(email) > 1)

And that is seriously how simple finding duplicate rows based on a compound key (multiple columns restraint) really is! :)

Related Posts:

About Craig Lotter

Software developer, husband and dad to two little girls. Writer behind An Exploring South African. I don't have time for myself any more.

  • Huynv86

    tks u. this post is what i'm finding

  • No problem, glad I could be of assistance

  • Needed to access this post’s contents again, and wow, is this useful or what! :)

  • Ddd

    when i use this query i could see only one of the duplicate entries. how to display all the duplicate entries??

    • That’s what it is designed to do. Once you know you have a duplicate on a particular key, run a separate query on that key I guess?

  • robbert

    This is taking the duplicates of within columns firstname /surname / email. Thus summing up the duplicates as defined three times independently.

    This is different though than duplicates being defined by equality within all three columns. A solution for that would be:select * from users awhere rowid (select max(rowid) from (select * from users) bwhere a.firstname= b.firstnameand a.surname=b.surnameand a.email=b.email)

    This only shows the duplicates, not the non-duplicates and first occurrence of a duplicated row.

    replace selct * by delete and they are gone

  • Se Johnston

    Nice GITS wallpaper!  Thx for the query!

  • Wierd

    How about matching records from different tables using multiple columns and scoring the matches based on the uniqueness of the columns e.g.
    col1 = Id, col2= name, col3=surname
    same values in other table
    Match by col1 & col2 &col3 then by col1&col3 then by col1& col2 and finally by col2&col3
    this brings up duplicates but the 1st match is more powerful match followed by the 2nd and so on.
    How would you do the query that would bring the results based on the strenght of the combinations and end up without duplicates