I’m currently trying to mimick a database migration from another system that needs to live in ecto in the future.
I’m trying to implement it as faithful as possible and due to this reason I’d like to set a UNIQUE constraint on the column and not just a unique index.
Until now I haven’t found a way to specify extra constraints. The resulting SQL should be something like
CREATE TABLE (
my_unique_field integer UNIQUE
)
It is probably not adding much but since I don’t want to take any chances I’d want this added.
I know that I can simply call raw SQL to alter the tables afterwards, but it would be nice if it was possible to specify this upfront in the add call, for example like this:
create table(...) do
add(:my_unique_field, :integer, unique: true)
end
Any pointers if this is possible or why it is not?
If I understand it correctly, the constraint/3 function is just for check or exclusion constraints. You cannot e. g. say: create constraint(:my_table, unique: :my_unique_field)
So the check constraint will yield a ALTER TABLE ... ADD CONSTRAINT ... CHECK (...) which is not what I’m after. But in theory this would be the correct function to implement such a thing.
afaik they’re the same thing at least on regular sql databases(i’m not sure about timescale and other “sql-like” databases with mods). but on regular sql db unique constraint on a column is achieved by a unique index.
just for reference from postgres
Adding a unique constraint will automatically create a unique B-tree index on the column or group of columns listed in the constraint. A uniqueness restriction covering only some rows cannot be written as a unique constraint, but it is possible to enforce such a restriction by creating a unique partial index. PostgreSQL: Documentation: 17: 5.5. Constraints
From the PostgreSQL perspective this is a distinction without a real difference. The unique constraint is implemented via the unique index.
Per the docs:
PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table. The index covers the columns that make up the primary key or unique constraint (a multicolumn index, if appropriate), and is the mechanism that enforces the constraint.
It is valid to create the index without the constraint.
To all the people stating that unique indexes and constraints are the same: This is not true.
If all you need are the features and functionality of a unique index the difference does not matter, but if you want to use the unique constraint with all the features of constraints – like being able to defer their validation – the difference does indeed matter.
@mmmrrr you can use execute/1,2 to add the constraint with raw sql after having created the table. Ecto doesn’t need to have explicit api for any and all things your db supports.
if i understand correctly all uniqueness checks are validated when the transaction is commited, that is the same as deferring a constraint.
If a conflicting valid row has been deleted by the current transaction, it’s okay. (In particular, since an UPDATE always deletes the old row version before inserting the new version, this will allow an UPDATE on a row without changing the key.) PostgreSQL: Documentation: 17: 62.5. Index Uniqueness Checks
Fair enough, though uniqueness deferral is in my experience not common (yes, it absolutely does happen). Absent any specific need for the deferral and I’m just as likely to go for the index… because while you cannot defer an index, you can’t build a constraint backed by an index concurrently… so something gained, something lost… naturally there are caveats to all of this.
However, if you just need uniqueness… a unique index alone is perfectly valid.
That’s not correct. You get the error for uniqueness validation immediately on the command creating the violation. Delete and insert has no conflict. Allowing two rows to temporarily share a unique column value within a transaction needs a deferred constraint.
I imagine there’s a dependency on operation order. If in the transaction, you update a row in way that will ultimately avoid the conflict, and then insert (or update) the conflicting row… then MVCC will have you covered.
However, if first you want to insert the conflicting row and the update the old row to be non-conflicting, then the deferral is something you’d need.
Admittedly, don’t run into this that often so I’d likely run a couple scenarios on-database to be sure I had it right, but that’s my understanding why the unique constraint is deferrable and mentioned later in the documentation you posted.
Here’s a walk through of transactions involving just MVCC and constraint deferability.
First we’ll set up the scenario with a simple table and the kind of scenario where you might need a deferrable unique constraint… but I’ll start with INITIALLY IMMEDIATE:
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 09:59:23 PST 2024]
> create table scb_test (
id bigint generated by default as identity ,
u1_id integer not null,
u2_id text not null,
constraint test_udx unique (u1_id, u2_id) deferrable initially immediate);
CREATE TABLE
Time: 4.533 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:00:11 PST 2024]
> insert into scb_test (u1_id, u2_id) values (1, 'a'),(2, 'b'),(3, 'c');
INSERT 0 3
Time: 2.640 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:02:24 PST 2024]
> select * from scb_test;
id | u1_id | u2_id
----+-------+-------
1 | 1 | a
2 | 2 | b
3 | 3 | c
(3 rows)
Time: 0.467 ms
OK… let’s do a transaction where we’ll update an existing record and insert a new one that would be conflicting if not for the update. We’ll update the first record in a way to avoid the conflict, and then insert an otherwise conflicting record. All of this is in one transaction, so to anyone outside of the transaction, the updated record will still have it’s original values until commit:
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:02:57 PST 2024]
> begin;
BEGIN
Time: 0.207 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:03:00 PST 2024]
> update scb_test set u1_id = 4, u2_id = 'd' where u1_id = 3 and u2_id = 'c';
UPDATE 1
Time: 0.376 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:03:04 PST 2024]
> insert into scb_test (u1_id, u2_id) values (3, 'c');
INSERT 0 1
Time: 0.279 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:03:10 PST 2024]
> commit;
COMMIT
Time: 1.926 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:03:25 PST 2024]
> select * from scb_test;
id | u1_id | u2_id
----+-------+-------
1 | 1 | a
2 | 2 | b
3 | 4 | d
4 | 3 | c
(4 rows)
Time: 0.383 ms
Not deferrable and no conflict. Because we resolved the uniqueness conflict with an order of operations which, inside of the transaction, avoided the conflict, we did not need deferability on commit. However, if we switch that operation ordering with a similar set of commands:
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:03:29 PST 2024]
> begin;
BEGIN
Time: 0.274 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:03:45 PST 2024]
> insert into scb_test (u1_id, u2_id) values (3, 'c');
ERROR: duplicate key value violates unique constraint "test_udx"
DETAIL: Key (u1_id, u2_id)=(3, c) already exists.
Time: 0.489 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:04:12 PST 2024]
> rollback;
ROLLBACK
Time: 0.207 ms
Because we tried to insert the conflicting record before updating the original record with those values, we got the constraint violation. Now let’s try that same operation with the constraint deferred:
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:05:50 PST 2024]
> begin;
BEGIN
Time: 0.352 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:05:57 PST 2024]
> set constraints test_udx deferred;
SET CONSTRAINTS
Time: 0.400 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:06:00 PST 2024]
> insert into scb_test (u1_id, u2_id) values (3, 'c');
INSERT 0 1
Time: 0.414 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:06:05 PST 2024]
> update scb_test set u1_id = 5, u2_id = 'e' where id = 4;
UPDATE 1
Time: 0.336 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:06:52 PST 2024]
> commit;
COMMIT
Time: 2.065 ms
localhost(from SCB-MUSE-BOXX).postgres.scb.5432 [Tue Dec 10 10:06:55 PST 2024]
> select * from scb_test;
id | u1_id | u2_id
----+-------+-------
1 | 1 | a
2 | 2 | b
3 | 4 | d
6 | 3 | c
4 | 5 | e
(5 rows)
Time: 0.524 ms
Our conflicting insert worked and… since we resolved the conflict prior to committing the transaction we were able to commit the transaction successfully. Had we not resolved the conflict, the transaction would have failed because of the constraint violation.
Anyway, wanted to be sure that we were straight for future readers.
Bringing this back to something to do with Elixir/Ecto SQL…
If all you need is enforced uniqueness, I see no problem with the unique_index method. You’re going to end up with a unique index whether you’re creating the constraint or creating the index directly. But, if you do have the scenario where you’ll be resolving unique conflicts in your code and don’t want to worry about detailed database operation ordering within the transaction, I believe a constraint is worth the extra Elixir/Ecto ceremony to get it created.
Thank you all for the great discussion and insights! It’s much appreciated
@LostKobrakai I agree that Ecto shouldn’t support every feature of every storage engine under the sun but in this case a little syntax sugar would have been nice actually I already did use execute as you suggested but it felt a little heavy handed for something seemingly simple - hence my question.
@sbuttgereit thank you so much for the in depth analysis. This was a fantastic answer - and the kind of answer why I adore this community. You all are really an asset to this ecosystem!