Wanting to add unique contraint to existing table but it might have duplicates

Hey there,
as the title says, there is a table that i want to add a unique constraint to it based on 2 fields user_id and topic_id, but it might be that there are duplicates, or records where these 2 match, so the index creation would fail.
How could i check if there are duplicates and if so only keep the newest one, and delete the potential older ones?
to me querying the whole table and then doing it in the code is unappealing, so i’d be looking for something with more ecto or sql action

Thanks

I’d do a subquery to select all user_id/topic_id combinations with more then 1 appearance and use that together with window functions to select all but the most recent entry with such a combination. Delete everything still in the resultset.

1 Like

In “pure Ecto” it will be possible, but it would be probably easier in raw SQL. So you need to state what DB are you using. But in general look at having.

1 Like

postgresql

Written from memory so I cannot guarantee the result, but it should be something like that (probably with few small modifications).

DELETE FROM table WHERE id IN (
  SELECT
    id,
    inserted_at,
    last_value(inserted_at) OVER (PARTITION BY user_id, topic_id ORDER BY inserted_at ASC) last
  FROM table WHERE last <> inserted_at
)
1 Like

DELETE FROM users_topics WHERE id IN
    (SELECT 
     id
    FROM 
        (SELECT id,
         ROW_NUMBER() OVER( PARTITION BY user_id,
         topic_id
        ORDER BY updated_at DESC) AS row_num
        FROM users_topics ) t
        WHERE t.row_num > 1 );
1 Like