How to emulate upcoming `load_async` in Rails 7?

belaustegui · June 9, 2021, 11:31am

I was reading this blog post and found this part really interesting:

Rails 7 introduces a method load_async to schedule the query to be performed asynchronously from a thread pool. If the result is accessed before a background thread had the opportunity to perform the query, it will be performed in the foreground.

The implementations seems to schedule the operation in a thread pool so it executes in background without blocking the process. This seems really close to one of the BEAM strengths.

The example in the Rails PR would be equivalent to something like:

def index(conn, _params)
  # The posts and categories queries are run in paralell
  posts = Repo.all(posts_query)
  categories = Repo.all(categories_query)

  render(conn, posts: posts, categories: categories)
end

My first thought would be to wrap the Repo calls in Task.async and Task.await. But Task.await is blocking so, to get the categories we would wait for the posts, even though they may be used in a different order in the template.

So, my question is: does anyone have an idea about how to emulate this into a Phoenix application? Could this be an interesting functionality in Ecto? Does any of this make any sense ?

stefanchrobot · June 9, 2021, 11:44am

Seems like a pretty good fit for Task.await_many/2:

def index(conn, _params)
  # The posts and categories queries are run in paralell
  [posts, categories] = Task.await_many([
    Task.async(fn -> Repo.all(posts_query) end),
    Task.async(fn -> Repo.all(categories_query) end)
  ])

  render(conn, posts: posts, categories: categories)
end

EDIT: fixed the code; sorry didn’t try this out in a real app.

LostKobrakai · June 9, 2021, 12:05pm

That’s going to be the tricky part: Having any access wait for actual data and return it. Implementing Enum and Access might get you part of the way, but you’ll still loose useful tools like pattern matching.

hauleth · June 9, 2021, 1:00pm

I would say that something like @stefanchrobot shown makes sense, as it allows to run several queries in parallel reducing load time. However in general lazy fetching like in load_async documentation:

If the result is accessed before a background thread had the opportunity to perform the query, it will be performed in the foreground.

IMHO doesn’t make much sense, as in Elixir the requests are naturally handled concurrently which is not always happening in Ruby (due to GIL).

benwilson512 · June 9, 2021, 2:17pm

As @hauleth notes, while this can improve the performance of a single request a little bit, it actually doesn’t improve the performance of the system as a whole in Elixir, because individual requests are already async WRT one another. In Rails this is not the case, so adding an extra async call in each request can improve the performance of the system.

However if you want to go forward with it @stefanchrobot I think has roughly the right solution.

dimitarvp · June 9, 2021, 4:26pm

This Rails feature is aimed to alleviate a weakness in its runtime; a weakness that Elixir doesn’t have. As @hauleth and @benwilson512 are pointing out, the potential wins are negligible while the complexity will explode, and developer ergonomics will be reduced.

There’s really no point in trying to emulate this in Elixir. Phoenix requests are already async enough. If you have a DB connection pool of 20, using 4 of them for a single request will make your other requests choke on DB connection starvation, on the off-chance that 1 out of 5 requests is a few milliseconds faster.

By using 1 connection per request we have more parallel bandwidth for users.

I get the idea but IMO any further innovation in that area should come from the databases themselves, e.g. maybe in the future you can tell PostgreSQL “I want all of these 5 queries executed inside a single connection, give me the results as they come in regardless of order”, a la HTTP 2 and 3 in-connection streams.

al2o3cr · June 9, 2021, 6:54pm

FWIW, Repo.preload/3 will already do something similar if passed a list of associations to preload when called outside of a transaction:

github.com

elixir-ecto/ecto/blob/78ba8713cf8f505688feac8ce524906cba3eb984/lib/ecto/repo/preloader.ex#L123-L139


defp maybe_pmap(preloaders, repo_name, opts) do
  if match?([_,_|_], preloaders) and not checked_out?(repo_name) and
       Keyword.get(opts, :in_parallel, true) do
    # We pass caller: self() so the ownership pool knows where
    # to fetch the connection from and set the proper timeouts.
    # Note while the ownership pool uses '$callers' from pdict,
    # it does not do so in automatic mode, hence this line is
    # still necessary.
    opts = Keyword.put_new(opts, :caller, self())


    preloaders
    |> Task.async_stream(&(&1.(opts)), timeout: :infinity)
    |> Enum.map(fn {:ok, assoc} -> assoc end)
  else
    Enum.map(preloaders, &(&1.(opts)))
  end
end