Transactions in controller

ideanl · January 7, 2018, 10:14pm

I’ve been referring to the Phoenix 1.3 Contexts guide for migrating my project and I notice that Repo.transaction() is never used in the controller actions for get_page! and update_page or delete_page. In my project, I’d like to follow the style of the Phoenix guides, but another contributor insists we must use a transaction to group all Repo calls in a controller.

It seems out of place to use Repo in the controller, but I don’t want to wrap all my Repo calls in one context function that contains the transaction call. If needed, where should I put the transaction call (controller or context), otherwise what is a good argument I can present to my contributors for not using a transaction?

cmkarlsson · January 7, 2018, 10:25pm

I’d like some clarification on best practices here as well. I understand the thinking about not having any Repo calls in the controllers (or web part as well). Phoenix is pushing “phoenix is not your application”.

Hence one might argue that if you actually do business logic inside your controller you are breaking that “separation” and all of that part should be moved into the context (or into your application, not web layer)

To help with this separation I tend to think: “what if I want to add another user interface to this application?”. Perhaps an ncurses or native GUI interface. If I had any business logic in the web controllers then I would have to replicate that in each interface I make.

On the other hand I feel there is a thin line between fast pragmatic approach of doing some things in the controller and having a proper separation.

It would help if you can you give a concrete example on what sort of Repo calls are being done in the controller.

ideanl · January 7, 2018, 11:30pm

I completely agree and believe in the separation of business logic to the contexts. This is simple when calling one large context function, but becomes impossible if a sequence of context functions are called (getting a record -> updating it) which need a common wrapper (transaction). Even if a common wrapper is not needed, as @cmkarlsson mentioned, it would be nice to separate the sequence of calls to business logic in the case of another user interface.

I would appreciate examples as well for how to best handle such a case. I can only think of either creating large context functions to wrap smaller calls but this easily leads to context bloat as the number of potential combinations of function calls is high.

ryh · January 7, 2018, 11:39pm

Only the grouped operations need to be in transactions though. It’s okay to have a function in your context that does a group of operations. You can even compose those smaller functions into a transaction.

Think of it this way - are you really going to need to do that group of operations outside of the transaction? And if that does happen, what will you do? Either remove the transaction or split things out to one that uses a transaction and one that doesn’t. I’ve spent time thinking about things like this and determined that it’s not worth worrying about until the time comes that it’s a problem.

Remember all the other things that you have wrapped in Repo calls (Repo.all Repo.get etc), why are these okay but not Repo.transaction?

bobbypriambodo · January 8, 2018, 12:24am

What’s their reasoning for insisting that?

Transactions are used for the atomicity of the operation, not necessarily for “grouping”. If you would need to ensure that the operation is atomic, i.e. either all successful or all fails, then all the more reason to wrap that logic in the context function instead of doing it one-by-one at controller site. For all the controller knows, it’s doing one operation, or errors altogether.

This doesn’t necessitate transactions. In this particular case, there’s no added benefit of using a transaction instead of multiple Repo calls.

Note that select calls wrapped in a transaction is valid, but pretty much useless. You would need a transaction when you have a sequence of inserts/updates/deletes, and each of those operations does not make sense on their own.

kip · January 8, 2018, 2:10am

Depends on the required isolation level. If you lock for update in the select, or you set the isolation level to repeatable read or serializable, you’d probably want the select and update in a transaction to close out the update cycle or recover from locking or timeouts.

bobbypriambodo · January 8, 2018, 2:24am

I didn’t know that, thanks for the correction. But that use case seems to be too specific to be generalized. It depends on what OP is trying to achieve, but I think in most CRUD cases you wouldn’t need it, no?

cmkarlsson · January 8, 2018, 2:41am

Well, every CRUD caring about correctness should do this to prevent concurrent updates of data. Any records where multiple persons can edit the same data must use “SELECT FOR UPDATE” before writing to the database.

In a crud app:

T0 Alice loads product A0 into her browser and starts editing the description
T1 Bob loads product A0 into his browser and edits the price. Note that both use the same data to start with.
T2 Alice saves her data to the database. Bobs data is now stale
T3 Bob saves his data overwritten Alices changes

A common way to do prevent this is to read back the data you are trying to write to make sure it hasn’t changed. You can then compare field by field or by timestamp or whatever. However, here you must use "SELECT FOR UPDATE (or equivalent) otherwise your record may change while you are doing the comparison.

michalmuskala · January 8, 2018, 2:57am

The situation described is not helped by transactions - or at least not how they are used in a regular application. In order to help the presented situation with transactions, your transactions would need to span over multiple requests, which isn’t really doable in any practical way.

The simplest solution to the presented problem is optimistic locking, which Ecto supports with Ecto.Changeset.optimistic_lock/3.

ideanl · January 8, 2018, 3:14am

In the simple case of retrieving a record, creating a changeset, and then performing Repo.update(), the worst that could happen by not using a transaction is that Repo.update would raise with no row found, correct? This case would occur if the record is deleted after the Repo.get but before the Repo.update. A transaction would solve this issue, but at the expense of requiring a specific context function to wrap all the steps above (assuming we keep the Repo calls in the context).

I believe that there are definitely many cases where the lack of a transaction could result in issues with the data, but in this case it helps insignificantly. However, the other contributor to my project still wants a transaction as a matter of principle. I suppose as of now I’ll just wrap the various steps in a context function that uses a transaction.

cmkarlsson · January 8, 2018, 3:18am

It is only the “read, check, save” requests that needs transactions. Not the initial data fetching.

Yes, I didn’t know about Ecto optimistic locking. I assume the implementation runs in a transaction and uses some sort of read for update lock in the database.

I.e if you don’t lock the current data between your read and update you are open to data races.

I wasn’t aware optimistic locking but principal is the same if you do it manually.

BEGIN TRANSACTION
SELECT data FROM x FOR UPDATE:
check_data to make sure it has not changed since you used it.
UPDATE data
END TRANSACTION

michalmuskala · January 8, 2018, 3:27am

SELECT FOR UPDATE is pessimistic locking and does nothing to help in the situation you described. You can use it in a single request, and still data will be lost when operated on in multiple requests. Pessimistic lock only helps within bounds of a single transaction.

Optimistic locking is much simpler, the form you render for the user has an additional, hidden, version field, then for the update query, you do something like:

UPDATE foo SET ... WHERE id = $1 AND version = $2

If the version changed, no row will be matched.

cmkarlsson · January 8, 2018, 3:29am

of course. I don’t know what I am thinking.

Surely the SELECT for UPDATE will help though as it would lock the rows when I read the data preventing anyone else from updating them? Otherwise I am losing my mind. It is just that it is pessimistic locking.