Any downsides to running database queries as Tasks?

brightball · May 3, 2017, 3:02pm

I’m building some abstraction in a toy project around Ecto and I’m wondering if there’s any reasons not to have the behavior default to executing each query within a Task?

Any thoughts here?

OvermindDL1 · May 3, 2017, 3:13pm

Eh, it is another process spawn when the work might otherwise be finished faster than the cost it adds is the main reason I’d think of, but even that is pretty minor…

peerreynders · May 3, 2017, 3:14pm

Task.async or Task.start(_link) (returning result via cast or call to the spawning process)? My main beef with Task.await is that it blocks (so what’s the point?).

The other issue is what do you plan/need to do when one of those tasks fail. When a process starts spawning other processes it is prone to taking on “supervisor-y” type of responsibilities - which may be actually better taken care of in an actual supervisor.

Just my two cents.

hubertlepicki · May 3, 2017, 3:14pm

In fact Ecto is trying to do exactly that on it’s own wherever possible. For example, when you preload associations, the preloading will be done in parallel, wherever possible.

brightball · May 3, 2017, 3:17pm

The Task.await blocking is actually the main reason for the abstraction. Say I’ve got a page load that triggers 5 queries to render. I’m triggering the 5 queries in the controller but not calling await until the first point in the code where I actually need to use the result, which is sometimes much farther down in the view.

It will need to block at that point to render whatever data was coming back, but the goal is to wait until the last possible moment to actually do it.

OvermindDL1 · May 3, 2017, 3:19pm

I do this in a couple of places too. I wish I could stuff them in to an Ecto.Multi and have them run in parallel or so, but as far as I’ve seen in the logs they are run serially…

EDIT: Do note that copying the output between processes may have a sizable overhead, so I try to minimize it as much as possible to prevent too many copies from flying around.

hubertlepicki · May 3, 2017, 3:25pm

Yes, in Ecto.Multi the queries are executed in the order they were added to multi. It’s slightly different purpose, for multi, since it’s really a wrapper for transaction.

Anyways, Ecto is doing Task.async/await internally to execute preloading queries, so I think @brightball you are good with this approach too:

github.com

elixir-ecto/ecto/blob/610934b79bfc29c411776bdf7b058872547ab302/lib/ecto/repo/preloader.ex#L85


  end
end


## Association preloading


defp maybe_pmap(assocs, repo, opts, fun) do
  if match?([_,_|_], assocs) and not repo.in_transaction? and
     Keyword.get(opts, :in_parallel, true) do
    opts = [caller: self()] ++ opts
    assocs
    |> Enum.map(&Task.async(:erlang, :apply, [fun, [&1, opts]]))
    |> Enum.map(&Task.await(&1, :infinity))
  else
    Enum.map(assocs, &fun.(&1, opts))
  end
end


defp preload_assoc(structs, module, repo, prefix,
                   %{cardinality: card} = assoc, related_key, query, preloads, opts) do
  {fetch_ids, loaded_ids, loaded_structs} =
    fetch_ids(structs, module, assoc, opts)

peerreynders · May 3, 2017, 3:32pm

Again not knowing the details of this particular use case - in some scenarios your argument may point to farming all 5 queries to a single separate process which aggregates the result - hopefully reducing the output to be copied to the originator. The originator would only have to deal with 1 instead of 5 processes and the decision whether to run the queries in parallel or not is deferred into the “single process”.

rvirding · May 3, 2017, 3:53pm

There is another reason for starting a task and doing a blocking wait. As the task is a separate process anything it does with its process, links or trap exits or uses the process dictionary or …, will not affect the calling process and so it is much safer. And for that matter it will not be affected by the calling processes local settings. It is also much easier and freer to crash when necessary without affecting the calling process.

This can be useful even if you just want to do a synchronous operation where you sit and wait. The erlang compiler, and LFE for that matter, does this, the compilation is run in a separate process.