Skip to content

This post covers everything you need to know about upserts in MySQL.

What is upsert in MySQL?

Upsert, a portmanteau of “update” and “insert”, is a database operation that inserts rows into a database table if they do not already exist, or updates them if they do. In MySQL, you typically do this with either the INSERT ... ON DUPLICATE KEY UPDATE syntax or the REPLACE INTO statement.

When should you use upsert?

Upsert is important when you want to avoid duplicate rows in your table while also making sure that the latest data is always present. It’s especially useful in data synchronization tasks, bulk inserts, or when handling unique constraint violations.

Insert … on duplicate key update

This is the most common approach for upsert operations in MySQL. It works by attempting an insert, and if a duplicate key error occurs (i.e., a row with the same primary or unique key already exists), it performs an update instead.

Syntax

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...)
ON DUPLICATE KEY UPDATE column1 = value1, column2 = value2, ...;

Example

INSERT INTO users (id, name, email)
VALUES (1, 'John Doe', '[email protected]')
ON DUPLICATE KEY UPDATE name = 'John Doe', email = '[email protected]';

In this example, if a user with id 1 already exists, their name and email will be updated.

Replace into

You can also do upserts in MySQL by using the REPLACE INTO statement. This approach first tries to insert a new row into the table. If a duplicate key error occurs, it deletes the old row and then inserts the new row.

Syntax

REPLACE INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

Example

REPLACE INTO users (id, name, email)
VALUES (1, 'Jane Doe', '[email protected]');

Here, if a user with id 1 exists, they are deleted, and the new data is inserted.

How to determine fields affected by MySQL upsert

Using Information Functions

MySQL provides information functions that you can use to determine the effect of the last executed statement:

  • ROW_COUNT(): This function returns the number of rows affected by the last statement. In the context of upsert:
    • For INSERT ... ON DUPLICATE KEY UPDATE, it returns 1 if a row is inserted, and 2 if an existing row is updated.
    • For REPLACE INTO, it returns 1 for a new insert, and 2 if it replaces an existing row (one deletion and one insertion).

Examining the Last Insert ID

  • LAST_INSERT_ID(): Useful primarily with INSERT ... ON DUPLICATE KEY UPDATE, this function returns the AUTO_INCREMENT id of the last row inserted. If an update occurs instead of an insert, the function returns the AUTO_INCREMENT id of the updated row, if the table has an AUTO_INCREMENT column, otherwise 0.

Logging Affected Fields

For more detailed tracking:

  1. Triggers: Implementing triggers on your table can help log changes. You can create a trigger for both insert and update operations that logs the old and new values into a separate audit table.
  2. Application Logic: If the upsert is executed via an application, you can add logic to the application to check the current state of the row and log the changes accordingly.

Choosing between the two methods

You should use INSERT ... ON DUPLICATE KEY UPDATE if you need to preserve other column values that are not part of the unique key. This method only updates the specified columns and leaves others untouched.

Use REPLACE INTO when you want to replace the entire row with new data. But remember that this deletes the existing row and inserts a new one, which you might want to be careful of (changing auto-incremented keys, etc.)

A couple other things to consider

  • Make sure your table has a PRIMARY KEY or UNIQUE index. Upsert operations rely on these to determine duplicates.
  • For INSERT ... ON DUPLICATE KEY UPDATE, you can use the VALUES() function to refer to the new row’s values.
  • Performance varies between the two methods, especially for large datasets, due to the differences in how they handle existing rows.

For day-to-day data operations, Basedash helps teams move from one-off SQL to AI-native BI workflows by pairing governed query generation with collaborative dashboards and consistent reporting.

Written by

Robert Cooper avatar

Robert Cooper

Senior Engineer at Basedash

Robert Cooper is a senior engineer at Basedash who builds full-stack product systems across SQL data infrastructure, APIs, and frontend architecture. His work focuses on application performance, developer velocity, and reliable self-hosted workflows that make data operations easier for teams at scale.

View full author profile →

Looking for an AI-native BI tool?

Basedash lets you build charts, dashboards, and reports in seconds using all your data.