Skip to content

MySQL tables often accumulate duplicate records, leading to various issues. Removing these duplicates is crucial for maintaining data integrity and improving performance. By taking proactive steps below, you can ensure your database remains clean and efficient.

How to identify duplicate records in MySQL?

Identify the duplicates in your table first. For example, in a table named employees, find duplicate records based on columns like email or employee_id:

SELECT email, COUNT(*)
FROM employees
GROUP BY email
HAVING COUNT(*) > 1;

This query identifies emails linked to multiple records, highlighting the duplication problem.

How to remove duplicates using a temporary table?

Create a temporary table to hold unique records. This method ensures that you retain all necessary data without duplicates:

CREATE TABLE employees_unique AS
SELECT * FROM employees
GROUP BY email;

Adjust the GROUP BY clause based on your criteria for unique records.

Delete and repopulate the original table

Clear the original table and then repopulate it with the unique records from your temporary table:

DELETE FROM employees;
INSERT INTO employees SELECT * FROM employees_unique;

This sequence removes all duplicates and reinstates only the unique records.

Add unique constraints to prevent future duplicates

Apply a unique constraint to critical columns to block new duplicates:

ALTER TABLE employees
ADD UNIQUE (email);

Implementing this constraint ensures that no two records can have the same email in the future.

Using a single query approach

If creating a temporary table seems cumbersome, consider a direct method, especially for tables with a primary key:

DELETE e1 FROM employees e1
INNER JOIN employees e2
WHERE e1.id > e2.id AND e1.email = e2.email;

This command removes duplicates based on the email column, keeping the record with the lower id.

By actively removing duplicates and setting constraints, you ensure your MySQL database operates more efficiently and accurately. Regular maintenance and checks for duplications can significantly enhance performance and data quality.

Written by

Robert Cooper avatar

Robert Cooper

Senior Engineer at Basedash

Robert Cooper is a senior engineer at Basedash who builds full-stack product systems across SQL data infrastructure, APIs, and frontend architecture. His work focuses on application performance, developer velocity, and reliable self-hosted workflows that make data operations easier for teams at scale.

View full author profile →

Looking for an AI-native BI tool?

Basedash lets you build charts, dashboards, and reports in seconds using all your data.