How to Find and Remove Duplicate Rows in MySQL Tables
Robert Cooper
Robert Cooper Senior Engineer at Basedash
· January 31, 2025
Robert Cooper
Robert Cooper Senior Engineer at Basedash
· January 31, 2025
Maintaining a clean and efficient database is crucial, and identifying and removing duplicate rows in a MySQL database is an essential part of this process. By proactively managing duplicates, you ensure data integrity and optimal database performance. Here’s how you can tackle this issue head-on.
Use the SELECT statement combined with the GROUP BY and HAVING clauses to pinpoint duplicate rows. This method highlights records sharing identical values in specified columns.
Consider the following example, which finds duplicates in column1 and column2:
SELECT column1, column2, COUNT(*)
FROM your_table
GROUP BY column1, column2
HAVING COUNT(*) > 1;
Replace your_table with the actual name of your table, and swap column1, column2 with the relevant column names.
Now, focus on eliminating the identified duplicates while retaining one instance from each set. Use the table’s unique identifier, such as the primary key id, to differentiate and delete the redundant entries.
Execute the following to remove duplicates, keeping the entry with the lowest id:
DELETE t1 FROM your_table t1
JOIN your_table t2
ON t1.id > t2.id AND
t1.column1 = t2.column1 AND
t1.column2 = t2.column2;
This command removes higher id rows that match in both column1 and column2.
Finally, safeguard your table against future duplicates by setting a unique constraint. This measure blocks the insertion of new duplicate rows.
Apply a unique index like this:
ALTER TABLE your_table
ADD UNIQUE INDEX idx_name (column1, column2);
Here, idx_name should be replaced with your chosen index name. Adding this constraint enforces uniqueness for combined values in column1 and column2, keeping your table free from duplicates moving forward.
Written by
Senior Engineer at Basedash
Robert Cooper is a senior engineer at Basedash who builds full-stack product systems across SQL data infrastructure, APIs, and frontend architecture. His work focuses on application performance, developer velocity, and reliable self-hosted workflows that make data operations easier for teams at scale.
Basedash lets you build charts, dashboards, and reports in seconds using all your data.