Seeking thoughts on advantages of the Repo.transact pattern vs disadvantages I’ve read about Ecto.Multi

dimitarvp · February 18, 2024, 4:51pm

Hello,

I am looking to more closely understand the advantages of the Repo.transact pattern and the quoted disadvantages of Ecto.Multi.

Repo.transact is quickly described in this blog post.

(Link to the original article by Sasa Juric is included in the post above.)

Here are four (4) points from the article, and my comments. I am quoting @sasajuric and @tomkonidas.

This function commits the transaction if the lambda returns {:ok, result}, rolling it back if the lambda returns {:error, reason}. In both cases, the function returns the result of the lambda.

Does Ecto.Multi ultimately not do that as well? Sure it’s not a direct response; in case of errors you get a tuple containing various state, including “changes so far”. I’d think that the rich choices offered by Ecto.Multi are welcome – you get to decide for yourself how do you want to react to problems. How is the Repo.transact feature the clear winner here?

One thing I can see is if we unify our error-handling code by utilizing shared helpers + including the Repo.transact in an even higher-level wrappers (business code). Is that the selling point – less boilerplate?

We chose this approach over Ecto.Multi, because we’ve experimentally established that multi adds a lot of noise with no real benefits for our needs.

I am very curious about this empirical evidence; I think having it spelled out somewhere would be hugely valuable both for new learners and more long-term users like myself. I can only imagine it’s the proposition that the else clauses of the with statement are hard to maintain? Or having to know the changeset functions (which I don’t view to be as negative thing as the blog post author seems to imply)?

As we can see, it is not the worst, but once we see the Repo.transact/2 way, it will be clear which is better.

It’s clear only insofar as the code is (1) shorter and definitely easier to read, and (2) has no else clause(s). Is that all? I am all for shorter and more readable code, I am almost religious about it too, but not all teams are open to code modifications on that basis alone. I am looking to understand if there’s more to this beyond code readability.

Perhaps the error-handling utilities we can combine with Repo.transact are the true value proposition here?

Another big benefit is that we do not need to go down to the changeset level for inserting, we could use our functions that perform Repo.inserts in them (Accounts.new_user_changeset/1 vs Accounts.create_user/1). This lets us compose many functions together from outside the context modules without having the need to expose your changeset functions.

Hmmm. I’ve been in 3 big-ish Elixir codebases (we’re talking 1000 - 3000 files) and I have never stumbled upon a problem that we the team would describe as “modules outside the Phoenix contexts have access to changeset functions and that is a problem”. I mean they are public; being able to call them is always on the table, this is not Java / C# / Rust et. al. where you can make functions (package|namespace)-private so you can limit who can actually call them.

A notable mention here is the boundaries library, made by Sasa Juric as well: Boundary - enforcing boundaries in Elixir projects. I’ve used it with success and I do like it but sadly I never managed to convince too many people of its value.

I am not sure I see the clear win here. Help me understand.

Maybe the actual root problem is that I worked as a contractor for several years and had to be very flexible – meaning that things that many of us as programmers would immediately agree on became actual contention points when working with different teams and CTOs. Or maybe I am just a bad developer advocate (would not surprise me ).

My summary of the advantages of Repo.transact would be (1) less boilerplate and (2) better separation of concerns. I respect and even worship both but I’ve met plenty of people who don’t so I am curious how would I sell them such a coding pattern better and be more convincing.

In any case: I am very curious as to why is the Repo.transact pattern deemed valuable (or not) by others. What other factors did I miss? Could you share your thoughts, please? Thank you.

sasajuric · February 18, 2024, 6:40pm

I wrote Repo.transact after seeing a lot of production code along the lines of what’s written in that excellent blog post by @tomkonidas.

The value proposition of Repo.transact is that control flow features such as passing data around, branching, early exit, can be implemented with standard Elixir features, such as variables, functions, and the with expression. The transactional logic is less special, and it doesn’t rely on some implicit behaviour of a function from some library.

Combined with the provable fact that the transact code is shorter (often significantly), even in such simple example as in that blog post, I have no doubt that the transact version is simpler and clearer.

That’s not to say that Multi is universally bad. The ability to provide each db operation as data is definitely interesting, and could be useful in the cases where the transactional steps need to be assembled dynamically (perhaps provided by the client code). But in the vast majority of cases I’ve encountered, I find the multi code needlessly difficult to read. This is true even in simple cases, and it becomes progressively worse if the transactional logic is more involved (e.g. if it requires branching early on in the transaction).

Hence, I strongly prefer transact, and it’s what I advise using in most situations.

lud · February 18, 2024, 11:12pm

We have something almost 100% similar in our current project (you can return :ok as well). We did not decide between that an Ecto.Multi, we just started to write some code using MyContext.update_thing, and then a second line of code, and later a third one… and then we found that that operation should be transactional now, so we just wrapped that in Repo.transaction. Later we found that we needed better error handling, and our version of Repo.transact was born.

The combination of Repo.transaction and with is clear and straightforward, we do not need more than that.

c4710n · February 19, 2024, 4:03am

For anyone who wants to know how @sasajuric implements the Repo.transact/1, check out mix_phx_alt/lib/core/repo.ex at 8ef7c36e5ac1a13a8152d0991757811cfd479568 · sasa1977/mix_phx_alt · GitHub

Personally, I think it’s better because of:

the checking logic for provided lambda.

the handling logic in the case of the lambda returning :ok/:error.

better error handling in the case of the lambda returning unexpected value.

wojtekmach · February 19, 2024, 5:33pm

Yeah I believe there might still be some specific use case for using multi but looking at my recent code I always reached for Repo.transact first exactly for the reasons mentioned above. In hindsight Repo.transact semantics would have been preferred over the semantics of the function argument to Repo.transaction. I can’t think of any upsides of the current semantics off the top of my head. I can’t quite explain it but the name transact feels off to me. On our projects we use the name transaction_with. If there are any ideas for alternative names I’d love to hear them.

LostKobrakai · February 19, 2024, 5:56pm

Ecto.Multi imo has its place. It allows for “declarative” combination of steps, where each step might bring some context (due to the use of callbacks) before the transaction is ever “started” and this can be done quite dynamically. That particular feature however I think is way less often needed than people expect it to and where it’s not needed Ecto.Multi does surely add noice (a bunch of boilerplate code) as shown in related blogposts.

I truely think the only thing problematic with Repo.transaction(fn -> … end) is the return value mapping – no explicit error return and requiring exceptions – which is what Repo.transact does change.

The only thing I might miss using Repo.transact is step names in development. Where actually expected it’s easy to add them manually, so they can be returned.

tfwright · February 19, 2024, 7:03pm

I wasn’t aware of the transact pattern (by that name) but I tend to just reach for Multi.run when I have complex logic in transactions. Maybe I’m missing something but that seems to give the same level of flexibility? Albeit with a slightly more clunky syntax…

I’m having trouble thinking of a scenario where I would not want the named operations you get with Multi and yet a plain old function with Repo.transaction also wouldn’t be enough.

dimitarvp · February 20, 2024, 12:18am

Thanks for the responses everyone. Can’t say I emerged very enlightened, though it’s good to know that I have observed correctly what people find valuable. But I am not sure I’ve seen something extra on top which is what I was looking for in order to be convinced to adopt it in my work. Would you be open to give actual code examples and clarify further, if that’s not too much to ask?

Terser code is great but I am not putting that on pedestal; if longer code conveys intent better / is more explicit / helps onboarding then I am all for longer code.

I… don’t understand how is that not doable with Ecto.Multi and/or Repo.transaction as well. Can you help me understand?

EDIT: To avoid looking like a complete idiot here, let me state that I understand that Ecto.Multi mandates a pipe or Enum.reduce and because of that with is not applicable there. OK, that part is clear. But passing data (through the data struct itself) and branching inside Multi.run or after Repo.transaction, and handling errors, can be done just fine. Not sure about early exit though, it’s likely not doable when using Ecto.Multi indeed.

Yes, that is my own main argument against Ecto.Multi. I’ve been in several projects where we had to be extremely diligent – I was responsible for money flowing from the checkout + payment flows – and I did the job without mistakes but it was a rather soul-crushing experience to manually track all success and failure branches when using Ecto.Multi. But… I am still not sure how using the Repo.transact pattern – which hides error-handling to boot – is improving things beyond the code being shorter. Somebody somewhere has to handle the errors, right? The fact that we’re not handling them there is not ridding us of this obligation. Am I missing something?

Can you please give an example on how this pattern improves error handling? I think I am having a huge derp moment because I still don’t get it and need actual examples.

Same, to me transact somehow does not sound right.

I very much agree with this and I have most often used Ecto.Multi exactly when I needed dynamism (also similar to what @sasajuric alluded to in his comment above).

…Ohhhh, that is starting to make sense to me now.

Not to be a rebel and go against everyone else here but – yes, this is how I feel about it as well.

Well the linked blog post shows you one scenario that to me looks like the the Saga pattern and not a direct replacement to Ecto.Multi. Repo.transact seems like a potential replacement for Ecto.Multi but it’s also its own thing that’s not directly orthogonal to it. But I am still a bit confused and I’d love it if the others chime in again with some examples so the benefits can truly click for me.

(@AstonJ I see that you changed the type of post to a Question; FYI I don’t feel that I should mark any comment as an accepted answer and truly intended this to be a discussion because there’s no singular right or wrong answer on this topic, I think. Hope that is OK.)

sasajuric · February 20, 2024, 9:16am

Yeah, this is the gist of Repo.transact. It just uses ok/error for commit/rollback instead of throwing exceptions. As a result, the code in the provided lambda can be expressed as a with chain, as demonstrated in the blog post. Let’s take a look at a slightly modified version of the blog post:

with {:ok, user} <- Repo.insert(user_data),
     {:ok, _log} <- Repo.insert(%LogData{user_id: user.id, ...}),
     ...,
     do: {:ok, user}

This is an equivalent of the blog sketch, but with wrapper functions removed, to make things more obvious.

To make this sequence transactional, all you have to do is place Repo.transact(fn -> ... end) around it. In other words, you can use vanilla Elixir to express the logical flow of the operation.

In contrast, Multi uses a custom mechanism of operation chaining. We do something like:

Multi.new()
|> Multi.insert(:user, user_changeset)
|> Multi.insert(:log, fn %{user: user} -> %LogData{user_id: user.id, ...} end)
|> ...

The fact that the 2nd insert is performed only if the 1st one succeeds is now not so obvious. It is a custom special behaviour of the library code (presumably handled by Repo.transaction).

So control-flow mechanism is now special. Outside of db transactions we use vanilla Elixir. Inside, we use the custom mechanism of multi. This is confusing. Why do I have to use different approaches to early-exit depending on whether I’m inside an Ecto transaction or not?

Furthermore, the plain multi operations such as insert & co can only go so far. Often you’ll need to fallback to run. Consider the following example:

with {:ok, user} <- Repo.insert(user_data) do
  Repo.insert!(%LogData{user_id: user.id, ...})
  ...
  {:ok, user}
end

This version is more precise about which parts can return an error to the caller, vs which parts should always succeed. If inserting a user record fails, we’ll return the error, which will typically be forwarded to the external client (e.g. browser). OTOH, if we fail to insert a log entry, it is a bug. The client code can’t fix it. Hence, we should fail in this case (aka let it crash).

With multi we need to do something like:

Multi.new()
|> Multi.insert(:user, user_changeset)
|> Multi.run(:log, fn repo, %{user: user} ->
  {:ok, repo.insert!(%LogData{user_id: user.id, ...})}
end)
|> ...

Which leads us to the weird situation where in some places we’re using Multi.insert, while in others Repo.insert!. In fact, you might end up with a combination of early-exit techniques. Some parts of the transactional flow will rely on the multi chain, while some will rely on with invoked inside Multi.run. I’ve encountered examples of such code in the wild, and I find it very confusing to read.

Another downside is context map which is threaded through the multi steps. This pattern abandons plain variables in favour of a weekly structured k-v bucket. How do we know that there’s the :user field in the bucket? Because somewhere earlier there’s a step which is tagged as :user. This is implicit, and it obfuscates things for the human readers, as well as for the machine.

For example, if I mistype the name and e.g. reference :useer, the incorrect code will still compile. If I treat this field as an integer, dialyzer will not complain. And it’s not just about the compile-time tools. For example, the context map tricks the GC, which can lead to extra memory consumption and slower GC times, because the data returned from each step remains reachable until the whole transaction is finished.

In summary, Repo.transact is IMO the simplest and the least invasive approach to make some code transactional. Put your logic inside a lambda, make sure that it returns ok or error, and you’re good to go.

Yeah, I agree that multi can be occasionally useful, but given how it is overhyped and extensively used, I personally think it should not be a part of Ecto at all, but instead provided as a separate library. That would hopefully dissuade folks from reaching for it by default. Because to me, the real question is when is multi justified over transact, since the former is more complicated than the latter

Yeah it sucks. I wanted it to be short and similar to transaction But ideally, Repo.transaction would behave like this when fun is passed.

Yes, you can do this with Multi.run, but it is going to add a lot of needless noise. The main issue with a transaction + fun is that it uses exception for flow control, which makes the code trickier to follow, and also leads to unwanted noise. If you have a with chain, you’ll have to add the else clause and convert the error into a rollback exception. This is basically how transact works, so you don’t have to do it repeatedly all over your codebase

The code written with transact is not only shorter but also simpler, more explicit, and clearer, because the flow is implemented with vanilla Elixir.

I don’t understand what’s confusing you here. Both Repo.transact and Multi support running multiple operations in a transactional context, and rolling back + returning on first error.

tfwright · February 20, 2024, 9:50am

I’m not sure this is the strongest argument for transact since a fair comparison with Multi.run seems like it should be against with plus one or more error clauses which imo is some of the most confusing Elixir logic to follow.

The main issue with a transaction + fun is that it uses exception for flow control

This seems to be precisely where you want Multi–complex flow control with multiple distinct user error cases. Conversely, I use a plain Repo.transaction only when errors are not expected but of course still need to rollback db on possible failure cases (time outs, network errors). If I saw this code in a PR:

Multi.new()
|> Multi.insert(:user, user_changeset)
|> Multi.run(:log, fn repo, %{user: user} ->
  {:ok, repo.insert!(%LogData{user_id: user.id, ...})}
end)

I would note there is no need to return a failed log (otherwise presumably there would be 2 insert ops) and request it be changed more like

Repo.transaction(fn ->
  user_changeset
  |> Repo.insert()
  |> case do
    {:ok, user}  -> 
      {:ok, user}
       Repo.insert!(%LogData{user_id: user.id, ...})
    error -> 
      error
end)

I agree that multi can be occasionally useful, but given how it is overhyped and extensively used, I personally think it should not be a part of Ecto at all, but instead provided as a separate library.

This seems more fair. In fact if a senior colleague pitched me on not using Multi in a project I would be amenable to it, if they wanted to roll their own using some form of transact but personally I would still include it in every project because I would prefer that to rewriting my own transactional error handling with with.

LostKobrakai · February 20, 2024, 10:08am

I think that’s generally not the path people follow though. People see multiple steps – insert user then log – and reach for Ecto.Multi.

Also I’d say that for your refactored case that’s exactly where with would shine.

sasajuric · February 20, 2024, 10:28am

Sorry, I don’t follow what you’re saying here.

The stock Elixir tool for chaining multiple operations and return on first error is with. I prefer to use that instead of some custom library mechanism.

Yes, and I’m not returning it from vanilla Elixir. Multi OTOH requires you to return something in each step. It could be nil, I guess, but IME that’s confusing, because we end up with named things which are always nil.

tfwright:

request it be changed more like

Repo.transaction(fn ->
  user_changeset
  |> Repo.insert()
  |> case do
    {:ok, user}  -> 
      {:ok, user}
       Repo.insert!(%LogData{user_id: user.id, ...})
    error -> 
      error
end)

If there’s an error inserting the user, this code will commit and return {:ok, {:error, changeset}}. This is how transaction + fun work, and it’s specifically what transact changes (by wrapping transaction).

Furthermore, the case version is noisy even in such a simple example, and it becomes progressively worse when you want to chain multiple operations, exiting early on some (but not all) of them. IIRC, this is precisely why with was introduced to the language.

Sorry, no idea what this means. What is being rewritten?

I agree. Given its API, multi pushes folks to uncritically return non-actionable errors, in the places where they should raise. I’ve seen way too much of such code.

wojtekmach · February 20, 2024, 10:38am

Another thing is, as was mentioned previously, sometimes it is nice that steps are named but what tends to happen is the call and the error handling is separated:

multi
|> Ecto.Multi.insert(:step1, ...)
|> Ecto.Multi.run(:step2, ...)
|> Ecto.Repo.transaction(...)
|> case do
  {:ok, ..., ..., ...} ->
    {:ok, ...}

  {:error, :step1, changeset, ...} ->
    {:error, ...}

  {:error, :step2, changeset, ...} ->
    {:error, ...}
end

or worse yet, it is the caller that has to do such case.

This is akin to the usage of with/1 that I’m definitely not a fan of either:

with {:step1, ...} <- {:step1, ...},
     {:step2, ...} <- ... do
else
  {:step1, {:error, _}} ->
    {:error, ...}

  {:step1, {:error, _}} ->
    {:error, ...}
end

In my experience it’s much better to have a simpler with even if the logic of each steps is
somewhat tricky we can safely tuck it away in a private function:

with {:ok, ...} <- private_function_1(),
     {:ok, ...} <- private_function_2() do
  {:ok, ...}
end

and it fits very well with Repo.transact.

Btw, a friendly reminder that from the Ecto README (emphasis mine)

Ecto is a toolkit for data mapping and language integrated query for Elixir.

So it’s totally OK to build custom things on top.

I might be totally off here but I believe multi and with/2 came out around the same time actually and multi at the time really did feel like a breath of fresh air and only later people realized that there might be better ways to solve these types of problems and that’s totally OK. Personally speaking, in hindsight multi should perhaps be less prominent in favour of exactly Repo.transact-like semantics but that ship has mostly sailed in the sense removing multi feels too big of a change. I think the Ecto team would consider adding another function with transact semantics hence my search of a good name for it.

sasajuric · February 20, 2024, 10:53am

Great point! And those post-transaction cases are IMO just terrible.

Also big +1! Each step should be responsible for returning its own success/failure. That way else can most often be avoided.

It wouldn’t be unprecedented. Both Ecto and Phoenix have a history of moving some things to other libraries. But I’m not pushing for this. However, it would be great if we had stock transact semantics, and if the official docs emphasized that approach.

Repo.atomic?

tfwright · February 20, 2024, 10:56am

It’s quite late here so possibly I am sleep deprived, but I am only reformulating a point that others have made, including you I believe, that the use of Multi is not always “needless noise.”

Sorry, no idea what this means. What is being rewritten?

Multi. transact is an alternate solution to the same problem, there could be others, but my point is that personally I am not dissatisfied with the semantics of Multi.

I am happy to cede this point, in fact I have also had the experience of needing to correct juniors’ use of Multi in various cases. I am not sure I yet see how an alternate solution like transact would prevent or discourage useless return values, but that would certainly be a mark in its favor.

I think the Ecto team would consider adding another function with transact semantics hence my search of a good name for it.

I would happily sign any petition to separate Multi out into its own library. I would consider the addition of a new competing utility unfortunate.

belaustegui · February 20, 2024, 11:05am

I totally agree with this. In my dayjob we have almost completely switched to transact instead of Ecto.Multi for new code. Repo.transaction is never used due to the weird semantics that have been already mentioned upthread.

Is there any place in which we can make an official discussion to potentially include Repo.transact directly in Ecto? Opening a PR in the Ecto repository would be easy but I believe it would’t be a good idea without a previous discussion with the maintainers.

sasajuric · February 20, 2024, 11:16am

I think that new features are discussed on the mailing list https://groups.google.com/g/elixir-ecto.

sasajuric · February 20, 2024, 11:31am

Yes, I agree. But “not always” doesn’t exclude “very often” And from the code I’ve seen in practice, folks tend to use multi way too much.

This discussion focuses a lot on multi, but transact is arguably an alternate solution to transaction. As it just so happens, once you have it in place, it becomes much more compelling than multi in many situations.

Admittedly not much. But I think it stands less in the way, because you’re using plain Repo, so it’s equally convenient to use insert or insert!. In contrast, with multi, it’s Multi.insert vs Multi.run + Repo.insert!, and from there I think most folks would choose the former

tfwright · February 20, 2024, 11:35am

I was just considering adding this in an edit! I think it cuts to the heart of the issue perfectly and I can 100% agree that it improves a lot on the semantics there.

Unfortunately I suspect modifying the semantics of transaction is an even tougher prospect then divorcing Multi from Ecto?

wojtekmach · February 20, 2024, 11:38am

Yes exactly, we cannot make Repo.transaction(fun) work in a different way without breaking existing code hence we need a new function name.