Hey there,
as the title says, there is a table that i want to add a unique constraint to it based on 2 fields user_id and topic_id, but it might be that there are duplicates, or records where these 2 match, so the index creation would fail.
How could i check if there are duplicates and if so only keep the newest one, and delete the potential older ones?
to me querying the whole table and then doing it in the code is unappealing, so i’d be looking for something with more ecto or sql action
I’d do a subquery to select all user_id/topic_id combinations with more then 1 appearance and use that together with window functions to select all but the most recent entry with such a combination. Delete everything still in the resultset.
In “pure Ecto” it will be possible, but it would be probably easier in raw SQL. So you need to state what DB are you using. But in general look at having.
Written from memory so I cannot guarantee the result, but it should be something like that (probably with few small modifications).
DELETE FROM table WHERE id IN (
SELECT
id,
inserted_at,
last_value(inserted_at) OVER (PARTITION BY user_id, topic_id ORDER BY inserted_at ASC) last
FROM table WHERE last <> inserted_at
)
DELETE FROM users_topics WHERE id IN
(SELECT
id
FROM
(SELECT id,
ROW_NUMBER() OVER( PARTITION BY user_id,
topic_id
ORDER BY updated_at DESC) AS row_num
FROM users_topics ) t
WHERE t.row_num > 1 );