I have a create action that I want to not create a new row in the DB if its identity already exists in the table and then run a conditional change that will do some other stuff depending if it added it or not.
For the first part, the only way I though it would work was to add the following to the action:
That way, if there is already a row in the db, it will do an upsert changing nothing.
This works but it is not ideal since I’m still doing an insert in the DB which I want to avoid. The other issue is that, if I add a Ash.Changeset.after_action change, I can’ t identify if the action did insert a new row or if it was an upsert.
The other way I thought was to not add the upsert code block and let it fail, but catch that in the change and retrieve the original data in the table to return and also handle the conditional change there.
The problem with this case is that, if the action fails, it will not call my change at all.
if you are using upserts, you can check if inserted_at == updated_at and do your conditional logic in that case (meaning the thing wasn’t there and you just created it).
You can have a before action hook that looks if the record is there, and if it is you can use Ash.Changeset.set_result to pre-empt the data layer logic.
You can write a manual handler for the create action with manual ....
I wouldn’t be too worried about doing the extra insert even if you don’t need it, and I’d personally choose the first option.
I except it to reach the dbg call, but when I run it with: ValidateGraphql.Domain.Resource |> Ash.Changeset.for_create(:create, %{second_attr: "blibs", second_arg: "blibs"}) |> Ash.create!() it will not reach it and instead give the second_attr: has already been taken error.
that doesn’t seem right. before_action hooks definitely run before the actual insert, so it doesn’t make a lot of sense to get the error first. Can I see your identities configuration?
Yep! That will do it That eager check is running before your logic. You could potentially work around it by replacing that with pre_check_with instead, or just add your own custom change that checks that value after your current logic.
This doesn’t sound safe to me under read committed.
begin t1;
set (inserted_at, updated_at) = (100, 100); # t1
begin t2;
set updated_at = 200; # t2
read (inserted_at, updated_at) -> (100, 200); # t2
commit t2;
read (inserted_at, updated_at) -> (100, 200); # t1;
commit t1;
Any insert side-effect would be lost. I don’t think the second option is safe either.
Edit: No, that’s not right - t2 would also insert. So you would get this:
begin t1;
set (i, u) = (100, 100); # t1
begin t2;
set (i, u) = (200, 200); # t2
read (i, u) -> (200, 200); # t2
commit t2;
read (i, u) -> (200, 200); # t1
commit t1;
So you would get two inserts. Depending on your side-effects this would probably still be broken, but maybe you could save yourself with a unique constraint here to fail one of the inserts.
There would be even weirder behavior with 3+ transactions but strangely I think it would always run the correct side-effect. Very hard to reason about.
Edit 2: Actually I seem to have been right the first time. Postgres docs suggest that upserts essentially don’t respect read committed isolation (sigh).
An upsert isn’t a combination of a read/update, its an INSERT ... ON CONFLICT ... RETURNING meaning it should happen atomically, AFAIK. It is worth verifying that though, because that would be a problem if I’m wrong
begin t1;
begin t2;
set (i, u) = (100, 100); # t1
read (i, u) -> (100, 100); # t1
set (i, u) = (200, 200); # t2
commit t1;
begin t3;
set u = 200; # t3
read (i, u) -> (100, 200); # t3
commit t3;
read (i, u) -> (100, 200); # t2
commit t2;
So t2 performed an insert but thinks it performed an update. What’s funny is you could argue whether this is acceptable or not since there was technically already an insert. But I definitely would not feel safe with any of this behavior TBH, it’s just too hard to reason about.
Again I think you would be okay here if you had a unique constraint to block the double insert.
Edit: I believe this is still incorrect because the upserts would lock the updated rows. Databases are hard!
I’m not sure I understand the issue TBH. Ash may be doing more for you on upsert than you realize?
The way it actually works in practice:
begin t1;
begin t2;
# t1
INSERT INTO table (id, column, inserted_at, updated_at)
VALUES (1, 'value', now(), now()),
ON CONFLICT (id) DO UPDATE SET column = 'value', updated_at = NOW()
RETURNING id, column, inserted_at, updated_at
commit t1
# t2
INSERT INTO table (id, column, inserted_at, updated_at)
VALUES (1, 'other_value', now(), now()),
ON CONFLICT (id) DO UPDATE SET column = 'other_value', updated_at = NOW()
RETURNING id, column, inserted_at, updated_at
commit t2
There is no opportunity for the updated_at value to be “interleaved” because RETURNING is not the same as running a SELECT after an insert. So there is no “read” step in this scenario.
EDIT:
so to be clear, the only chance in this scenario that updated_at equals inserted_at is when you just created the thing.
It’s an interesting question. We’ve talked about adding something to upsert operations to allow data layers to return some metadata indicating whether or not an upsert or create actually occurred, but its actually a hard problem in postgres as AFAIK there is not a reliable indicator of that information.
I did not realize you were using RETURNING - I think you are correct, then, it should return the values atomically.
Yeah this is another problem.
BTW, I am actually still not sure my above comments were correct anyway because I think I might be misunderstanding how upserts take row locks. This is why lower isolation levels are so brutal - it’s basically all implementation details of the DB!
Yeah, a lot of these things are much safer with a unique - which I just remembered would be required for an upsert anyway! So I don’t think my comments were correct regardless because the rows would be locked. I still think you could get weird behavior, though.
So I went back to the docs to see if I could glean anything more about the upsert/lock behavior and I found this cosmic horror of a line:
If a conflict originates in another transaction whose effects are not yet visible to the INSERT, the UPDATE clause will affect that row, even though possibly no version of that row is conventionally visible to the command.
Frankly I don’t even know what to make of this. Like, what would happen here:
begin t1;
begin t2;
set x = 1; # t1
set x = (x + 1); # t2 - upsert
rollback t1;
commit t2; # does this commit?!
Also, amusingly I think this means my very first comment (pre-edit) was actually correct (if you don’t use RETURNING, that is). NGL I kinda hate databases.
Yeah I think in this case we’re good with potentially one exception, which is that if you do an upsert in the same exact millisecond in another transaction, but honestly I think the cycle it would take for them to wait on locks for each other would make that very unlikely if not impossible TBH
I’d call the inserted_at == updated_at strategy good enough, with a very very very unlikely case of clash. In the future, when we figure a good way out to return back from the operation with metadata indicating if it was a create or an update, the inserted_at == updated_at can be switched to record.__metadata__.created?.
If you have multiple nodes then their clocks are actually quite likely to be out of sync by more than a millisecond, so the locks wouldn’t save you. Ecto calculates timestamps in Elixir, and I assume Ash does the same, no?
If it was just one project where you could control the constraints then maybe you’d be safe, but since Ash is a framework I feel like Murphy’s law will come into play here. At some point someone is going to deploy a multi-node app with a heavy write node and get a collision