I am having an issue where same data is inserted multiple times into a table in the database at the same time from an api endpoint.
Although I have a unique constraint on the table, it has become clear to me that I did not use the right columns to create that unique constraint.
Apart from deleting the repeated records and recreating the unique index constraint on the table in the database, what other options are available to solve this, especially in terms of writing a function?
Deleting the repeated records is currently not an option for me because it is a production system.
I would say add an Ecto validation that hits the DB and checks the columns the combination of which must be unique. Not ideal by any means but could be a good temporary solution.
Example:
# Given a table that has `username` and `domain` e.g. `jerry` and `google.com`
# or `jerry` and `yahoo.com`:
def ensure_unique(%Changeset{} = cs, params) do
q = from(
u in Users,
where: username == ^params["user"] and domain == ^params["domain"]
select: :id
)
# We're checking the amount of records that match the record we're about to insert here.
case Repo.aggregate(q, :count, :id) do
# We're okay, let's go and insert.
0 -> cs
# Return a new changeset containing an error.
_ -> Changeset.add_error(cs, :username, "username and domain are not unique")
end
end
Then you add just append ensure_unique to your pipeline before the Repo.insert call.
Of course it will not but any better solution would have to lock table at various levels and when you get to complexity like that you’re much better stop dancing around the problem and fix it properly.
A better solution is to create a new table with correct constraints, migrate all the data to it, lock and rename the tables. If you really want to have almost zero downtime, instead of migrating data manually, create a trigger that copies data to the new table automatically on insertion.
Yes it is, I agree, but OP mentioned he doesn’t want to delete records from the original table – which led me to believe they don’t want to do a full fix right now. Could be a wrong assumption from my side.
A new table is indeed usually the safest and best option overall.
You will have to lock your tables whenever there is an insert or an update to make sure two operations aren’t create the same value. I strongly advise you to get rid of this idea of a quick fix and just bite the bullet and make a new table.
Can you expand on why you can’t delete duplicate records from a production system?
Maybe you could do zero downtime SQL stuff like this?
Like someone already said create another table that is correctly created
Modify app to start inserting and deleting if needed data to/from both old and new tables to keep them in sync in your next app update but still use old tables for reads by that I mean displaying data in UI.
After new app version in production with those changes create and run an SQL script that copies old data over from old table to new table. Make sure you copy distinct data so there will not be a crash because of correct unique constraint in new table.
Modify app just use new table and remove old table use from code
After that version is in production you can just delete the old table.
Thank you all for the immense contributions I have received. The common idea in all the posts I have read so far since I posted the question points to removing the repeated records from the table and creating the right duplicate check constraint on the it.
I am implementing the steps to clean up the table and enforce the right unique constraint, as you have directed.