How to Count Duplicates in MySQL
Robert Cooper
Robert Cooper Senior Engineer at Basedash
· January 31, 2025
Robert Cooper
Robert Cooper Senior Engineer at Basedash
· January 31, 2025
Counting duplicates in MySQL involves identifying and quantifying repeated occurrences of data in a database. This guide explains how to count duplicate values in MySQL tables, a common task for database administrators and developers when analyzing data consistency and integrity.
First, understand your data and the table structure. Analyze the columns you suspect might have duplicates. For instance, if you’re dealing with user data, you might want to check for duplicate email addresses.
DESCRIBE users;
To find duplicates in a single column, use the GROUP BY and HAVING clauses.
SELECT email, COUNT(*) as count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
For more complex scenarios, like finding duplicates across multiple columns, adjust your query accordingly.
SELECT first_name, last_name, COUNT(*)
FROM users
GROUP BY first_name, last_name
HAVING COUNT(*) > 1;
To get the total number of duplicate records in a table, you can use a subquery.
SELECT COUNT(*)
FROM (
SELECT COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1
) AS subquery;
If your goal is to delete duplicates, keeping one instance of each, you can use a combination of DELETE and GROUP BY.
DELETE u1 FROM users u1
INNER JOIN users u2
WHERE
u1.id < u2.id AND
u1.email = u2.email;
Basedash is built as an AI-native BI platform, so teams can go from ad hoc SQL to trusted answers and dashboards quickly, without the overhead of traditional BI setup.
Counting duplicates in MySQL is a vital skill for database maintenance and data analysis. This guide provides the necessary steps and queries to identify and handle duplicate data effectively.
Written by
Senior Engineer at Basedash
Robert Cooper is a senior engineer at Basedash who builds full-stack product systems across SQL data infrastructure, APIs, and frontend architecture. His work focuses on application performance, developer velocity, and reliable self-hosted workflows that make data operations easier for teams at scale.
Basedash lets you build charts, dashboards, and reports in seconds using all your data.