Avoiding same data inserted multiple times at the same time into a table

jerry · August 15, 2023, 9:34am

Hello,

I am having an issue where same data is inserted multiple times into a table in the database at the same time from an api endpoint.
Although I have a unique constraint on the table, it has become clear to me that I did not use the right columns to create that unique constraint.

Apart from deleting the repeated records and recreating the unique index constraint on the table in the database, what other options are available to solve this, especially in terms of writing a function?
Deleting the repeated records is currently not an option for me because it is a production system.

I would be grateful for your kind support.

Thank you.

Jerry

dimitarvp · August 15, 2023, 10:28am

I would say add an Ecto validation that hits the DB and checks the columns the combination of which must be unique. Not ideal by any means but could be a good temporary solution.

Example:

# Given a table that has `username` and `domain` e.g. `jerry` and `google.com`
# or `jerry` and `yahoo.com`:
def ensure_unique(%Changeset{} = cs, params) do
  q = from(
    u in Users,
    where: username == ^params["user"] and domain == ^params["domain"]
    select: :id
  )

  # We're checking the amount of records that match the record we're about to insert here.
  case Repo.aggregate(q, :count, :id) do
    # We're okay, let's go and insert.
    0 -> cs

    # Return a new changeset containing an error.
    _ -> Changeset.add_error(cs, :username, "username and domain are not unique")
  end
end

Then you add just append ensure_unique to your pipeline before the Repo.insert call.

D4no0 · August 15, 2023, 10:31am

This will not work for concurrent inserts (don’t ask me how I know that ).

dimitarvp · August 15, 2023, 10:32am

Of course it will not but any better solution would have to lock table at various levels and when you get to complexity like that you’re much better stop dancing around the problem and fix it properly.

D4no0 · August 15, 2023, 10:35am

A better solution is to create a new table with correct constraints, migrate all the data to it, lock and rename the tables. If you really want to have almost zero downtime, instead of migrating data manually, create a trigger that copies data to the new table automatically on insertion.

dimitarvp · August 15, 2023, 10:37am

Yes it is, I agree, but OP mentioned he doesn’t want to delete records from the original table – which led me to believe they don’t want to do a full fix right now. Could be a wrong assumption from my side.

A new table is indeed usually the safest and best option overall.

D4no0 · August 15, 2023, 10:39am

You will still have duplicated records in the old table, then you can decide how you migrate them to the new table.

jerry · August 15, 2023, 10:25pm

Thanks to you all for all the suggestions that have come up so far.

I am still trying out many different ideas, hoping to find a solution.

joey_the_snake · August 16, 2023, 7:25am

You can do something like this:

Create a new table with the constraint you want.
Have your app insert into both the old and the new table.
Backfill the new table with what it’s missing from the old table.
Switch your app to use the new table only
Get rid of old table

joey_the_snake · August 16, 2023, 7:38am

You will have to lock your tables whenever there is an insert or an update to make sure two operations aren’t create the same value. I strongly advise you to get rid of this idea of a quick fix and just bite the bullet and make a new table.

wanton7 · August 16, 2023, 7:57am

Can you expand on why you can’t delete duplicate records from a production system?

Maybe you could do zero downtime SQL stuff like this?

Like someone already said create another table that is correctly created
Modify app to start inserting and deleting if needed data to/from both old and new tables to keep them in sync in your next app update but still use old tables for reads by that I mean displaying data in UI.
After new app version in production with those changes create and run an SQL script that copies old data over from old table to new table. Make sure you copy distinct data so there will not be a crash because of correct unique constraint in new table.
Modify app just use new table and remove old table use from code
After that version is in production you can just delete the old table.

jerry · August 17, 2023, 10:28am

Thank you all for the immense contributions I have received. The common idea in all the posts I have read so far since I posted the question points to removing the repeated records from the table and creating the right duplicate check constraint on the it.
I am implementing the steps to clean up the table and enforce the right unique constraint, as you have directed.

I am grateful.

Regards,
Jerry