Where do you preload associations?

josefrichter · January 5, 2022, 11:57am

Simple question re. code organization: do you do preloads in context, or in controller/liveview?

Say a basic function in context looks like this

def get_class!(id) do
    Repo.get!(Class, id)
    |> Repo.preload(:users)
end

But basically sometimes I need the association, sometimes I don’t.

So would you typically do get_class!(id) |> Repo.preload(:users) in controller/liveview? (I have a feeling that basically I should never call Repo… from controller/liveview)

Or would have separate get_class!/1 and get_class_with_assoc!/1 in context?

Or am I overthinking and it doesn’t really matter? Thank you.

stefanchrobot · January 5, 2022, 1:32pm

I avoid calling repo from the controllers. My usual pattern is to do this in the context:

def get_class!(id, preloads \\ []) do
  Repo.get!(Class, id)
  |> Repo.preload(preloads)
end

sodapopcan · January 5, 2022, 2:03pm

Oooo, I’ve never thought about sending a preloads as a parameter. The only thing I don’t like this is that this gives LiveViews/Controllers some schema knowledge. I usually go the get_posts_with_user route. What’s your mileage been like with the extra argument, @stefanchrobot? I do like it in theory but I’ve only been thinking about it for 5 mins.

stefanchrobot · January 5, 2022, 2:13pm

I’m fine with the coupling, because the schema knowledge is already there - whenever you access associations via the root schema, you introduce the coupling:

def show(conn, %{"id" => id} = params) do
  {:ok, foo} = Foo.get_foo_with_bar(id)
  # The coupling is in foo.bar
  render(conn, "show.json", foo: foo, bar: foo.bar)
end

In theory one could return {foo, bar} from the context, but I don’t think it’s worth it. Plus in reality, most of the time, the bar is linked with that specific foo either way and not just some free-floating entity.

sodapopcan · January 5, 2022, 2:18pm

In my mind, coupling when reading data structures and coupling when querying are different but actually, not really That’s cool, thanks for the answer! I’m gonna try it out.

derek-zhou · January 5, 2022, 2:25pm

Another way to do it is to not to use preload altogether. Just create views that include the association at the db level and treat them as separate schemas in ecto. So you will have the following schemas:

class, which maps to the table
class_with_users, which maps to the view that joins class and users

Then you just have different context functions as usual.

belaustegui · January 5, 2022, 2:42pm

I would say that preloading in the controller is not bad and you should not worry about doing it.
If my controller needs to preload two associations because they are going to be displayed in a template this is a presentation concern. If the template change, I may not need this preloads, or I may need different ones.

There always is the option of passing a preload list to the context as was already mentioned in this tread. I agree with your concern that this would couple the controller with the schema and data model.
I think that at that point we are building an abstraction over Ecto just to avoid using the Repo module in the controller. We may as well call Repo.preload directly and remove an indirection layer.

In my opinion we should focus more about extracting business logic into the context than in blindly following dogmas such as “never use the Repo from the controller”.

My preferred way to doing this is having different functions for building the query in the context (those functions should have some real business logic that makes their extraction worthy).
Then, the controller can use those functions from the context and its own functions (for presentation concerns) to build the desired query and trigger it using the Repo.
For example the controller may want to select a subset of the fields since they are the only ones required in the template, or preload some associations that are required for presentation. As long as the real business logic is extracted properly, this should be fine.

If you follow this pattern you may see that some of the queries that you build in controllers may have common points such as preloads. This is not bad per se, since tomorrow you can modify the controller or the template and know that you are not breaking any other part of the application. At that point, you will have enough information to decide if extraction is worthy or not.

Locality of Behaviour is very valuable, and we usually don’t think about it much.

josefrichter · January 5, 2022, 2:47pm

Could you please elaborate on this? Maybe short example in (pseudo)code? Not sure I fully understand.

derek-zhou · January 5, 2022, 3:33pm

A view is a virtual table, see the postgresql doc here

If you define a view that flattens the foreign key association, you can just treat it as a normal table in ecto (you cannot insert into it though)

josefrichter · January 5, 2022, 3:50pm

interesting. and you need to create the view using a migration with something like create view (:classes_with_users)? Is that supported in ecto? Or do I need to create that manually directly in Postgres?

dom · January 5, 2022, 4:09pm

benwilson512 · January 5, 2022, 5:00pm

This is a generally hard problem IMHO. One of my great open source regrets is not having enough time to make the Dataloader pattern more accessible outside of GraphQL because it was designed to address the following tension:

On the one hand: Presentation / API layers tend to want / need a lot of flexibility in terms of asking for information
On the other hand: Access control, filtering rules, and other data fetching code is often fraught with business logic, and this lives in the contexts.

If you allow the controllers to just pass in preload parameters you skip the business logic. It’s a tough problem. Two ways come to mind for how to solve this:

The dataloader pattern

Basically you create some sort of mediating entity that your business logic can hook into to enforce rules, and your “client” code can use to compose requests to fetch stuff. This works really well in Absinthe (GraphQL) because there is an actual query document to wire Dataloader into. It’s less ergonomic in traditional controllers and ends up being super callback heavy (at least today).

CQRS style Read Models

You create dedicated database views / tables where you can just query them in a simple way and the data has already been written such that it’s impossible to query data you shouldn’t see. This can quickly add a lot of boilerplate.

al2o3cr · January 5, 2022, 9:26pm

It depends - the need for preloads “sometimes” could mean you have two concepts and therefore should have two functions.

aswinmohanme · January 6, 2022, 5:52am

I have a with_assoc(schema, assoc) function in each context that just passes the association directly to the Repo.preload(assoc) function.

  def with_assoc(book, assoc), do: Repo.preload(book, assoc)

I use it directly in the controller/live_view directly

    book = Books.get_book_from_slug!(book_slug) |> Books.with_assoc([:author, :draft])

When I want to use the same logic somewhere else, I create a function called get_book_from_slug_with_author_draft(book) and continue.

Gives flexibility without sacrificing readability.

josefrichter · January 25, 2022, 1:00am

I just came across interesting variation on how to incorporate preloads into parameters (and combine with various other):

def list_classes(criteria \\ []) when is_list(criteria) do
    query = from(c in Class, order_by: c.title)

    Enum.reduce(criteria, query, fn
      {:paginate, %{page: page, per_page: per_page}}, query ->
        from q in query,
          offset: ^((page - 1) * per_page),
          limit: ^per_page

      {:sort, %{sort_by: sort_by, sort_order: sort_order}}, query ->
        from q in query, order_by: [{^sort_order, ^sort_by}]

      {:preload, associations}, query ->
        from q in query, preload: ^associations
    end)
    |> Repo.all()
  end

aesmail · January 25, 2022, 3:59am

Interesting approach!

I would extract the code inside the reduce function into its own function with pattern matching maybe.

For example:

def list_classes(criteria \\ []) do
  query = from(c in Class, order_by: c.title)

  criteria
  |> Enum.reduce(query, &combine_criteria/2)
  |> Repo.all()
end

defp combine_criteria({:paginate, %{page: page, per_page: per_page}}, query) do
  # create pagination
end

defp combine_criteria({:sort, %{sort_by: sort_by}}, query) do
  # sorting
end

soup · January 25, 2022, 4:12am

I use something like that. I forget where I got most of the code from,
or at least the scaffolding of the idea.

Here’s a rough dump of it that I might as well share. I can not promise
to the quality of it, use at your own risk. It’s not perfect, it’s probably not
even good really, it’s a crutch. It’s kind of like a poormans GraphQL mixed
with a less expressive Ecto.Query. The worst of all worlds.

Usage looks roughly like:

Posts.list_posts(
  by_category: "unhinged-rants",
  posted_between: {last_year, now},
  assocs: [:author, :comments]
)

It lets you define broad ranges of queries in a pretty flexible manner.
It doesn’t cover everything, you sometimes need to write
custom one-shots. You can also use the Post.Queries module directly to
pipe that into Repo.exists() etc.

What I do like about it is, since the API is basically defined as functions,
you get pretty good code suggestions, as well as “no matching function” errors,
including the list of possible choices, when you muck up. You can also make
your API as confusing as you like, with assocs: [:authors, {:comments, magic_number_8}, "desc"] etc, since you’re just pattern matching on something.
(More practically, I have a few fields that can be true | false | [true, false] or date | {date_start, date_end} and the interface feels pretty
fluid).

Managing nested associations can be awkward, it’s best to set a hard limit.
Maybe getting comment authors via a post is ok, but getting comment author
likes is a bit “too” nested. As I said, it is also kind of a poormans
everything and you will definitely hit queries that are painful to run, or
unexpectedly limit join results because of a where clause.

I think running something like GraphQL as your internal API might be the best
bet in the end. I find controllers/views often have pretty
fluid-but-also-duplicated query requirements and you just end up needing
something that supports that. I am not sure if Absinthe really supports being
used as an internal API though. The docs mention one way with
absinthe_phoenix, but it’s pretty slim and absinthe_phoenix's docs seem to
be more about exposing a GraphQL API via Phoenix? Truthfully I just haven’t had
the time to run through it though.

Anyway, here’s the code, may contain gore, viewer discretion is advised.

The macro, which you use to define your “queryable” terms.

defmodule YourApp.Queryable do
  @moduledoc """
  Helper macros for defining a "Queries" interface.

  Call Queryable.setup() then

  ```
  Queryable.has_filter(:filter_name, fn query, filter_value ->
    query
    |> where ...
  end)
  ```

  ```
  Queryable.has_assoc(:parts, fn query ->
    query
    |> preload(:parts)
  end)
  ```

  which produces

  ```
  Queries.filter_name(query, val)
  Queries.with_parts(query)
  ```

  and

  ```
  list_x(filter_name: 1, assocs: [:parts])
  ```
  """

  @doc """
  Required to setup option chaining functions. 

  Accepts an optional "preflight" function for altering options, defaults to

  ```
  fn opts -> opts end
  ```
  """
  defmacro setup() do
    f =
      quote do
        fn opts -> opts end
      end

    quote_setup(f)
  end

  defmacro setup(func) do
    quote_setup(func)
  end

  defp quote_setup(preflight_opts) do
    quote do
      def apply_opts(query, opts) do
        opts = unquote(preflight_opts).(opts)

        # force assocs to be done first so we can rely on "has_named_binding"
        # in queries if we want to do checks only if the data was requested.
        opts =
          case Keyword.has_key?(opts, :assocs) do
            true ->
              {assocs, opts} = Keyword.pop(opts, :assocs)
              [{:assocs, assocs} | opts]

            false ->
              opts
          end

        do_filter(query, opts)
      end

      defp do_filter(query, []) do
        query
      end

      defp do_filter(query, [{:assocs, assocs} | rest]) do
        query
        |> do_assoc(assocs)
        |> do_filter(rest)
      end

      defp do_assoc(query, []) do
        query
      end
    end
  end

  # func = {:fn [line: 4] [...]}
  defmacro has_filter(name, func) when is_atom(name) and is_tuple(func) do
    # has_filter(:by_id, fn q, id -> where(q, id: ^id))
    quote do
      # def by_id(query, val)
      def unquote(name)(query, val) do
        unquote(func).(query, val)
      end

      # def do_filter(query, [{:by_id, val} | rest])
      defp do_filter(query, [{unquote(name), val} | rest]) do
        query
        |> unquote(name)(val)
        |> do_filter(rest)
      end
    end

    # |> tap(fn m ->
    #   m
    #   |> Macro.to_string()
    #   |> IO.puts()
    # end)
  end

  # {:fn [line: 4] [...]}
  defmacro has_assoc(name, func) when is_atom(name) and is_tuple(func) do
    # has_assoc(:seller, fn q, id -> ... from(..., preload: ...)
    quote do
      # the actual joiner
      defp with_join(query, unquote(name)) do
        unquote(func).(query)
      end

      # def with_seller(query)
      def unquote(:"with_#{name}")(query) do
        with_join(query, unquote(name))
      end

      # def do_assoc(query, [:seller | other])
      defp do_assoc(query, [unquote(name) | rest]) do
        query
        |> with_join(unquote(name))
        |> do_assoc(rest)
      end
    end

    # |> tap(fn m ->
    #   m
    #   |> Macro.to_string()
    #   |> IO.puts()
    # end)
  end
end

Then define the “Queries” for a model:

defmodule YourApp.Posts.Queries do
  import Ecto.Query, warn: false
  import YourApp.Ecto.Query
  alias YourApp.Posts.Post
  require YourApp.Queryable, as: Q

  @type query_options :: [
          by_ids: [integer(), ...] | nil,
          by_slug: String.t() | nil,
          assocs: [:comments, :authors, ...] | nil,
          by_search: String.t() | nil
        ]

  def posts() do
    base()
  end

  Q.setup()

  Q.has_filter(:by_id, fn query, id when is_integer(id) or is_binary(id) ->
    query
    |> where([post: post], post.id == ^id)
  end)

  Q.has_filter(:by_slug, fn query, slug when is_binary(slug) ->
    query
    |> where([post: post], post.slug == ^slug)
  end)

  Q.has_filter(:by_search, fn
    query, search ->
      search =
        search
        |> to_tsquery_format()

      query
      |> where(
        [post: p],
        tsvector_search(
          [
            tsvector(p.title, weight: "A"),
            tsvector(p.body, weight: "B")
          ],
          tsquery(^search)
        )
      )
      |> Ecto.Query.order_by(
        [post: p],
        {:desc,
         tsvector_rank(
           [
             tsvector(p.title, weight: "A"),
             tsvector(g.body, weight: "B")
           ],
           tsquery(^search)
         )}
      )
  end)

  Q.has_assoc(:comments, fn query ->
    if has_named_binding?(query, :comments) do
      query
    else
      from([post: post] in query,
        left_join: comments in assoc(group, :comments),
        as: :comments,
        preload: [comments: comments]
      )
    end
  end)

  Q.has_assoc(:author, fn query ->
    if has_named_binding?(query, :author) do
      query
    else
      from([post: post] in query,
        left_join: author in assoc(post, :author),
        as: :author,
        preload: [author: author]
      )
    end
  end)

  defp base() do
    from(_ in Post, as: :post)
  end
end

And define an API inlet:

defmodule YourApp.Posts do

  import Ecto.Query, warn: false
  alias YourApp.Repo

  alias YourApp.Posts.Post
  alias YourApp.Posts.Queries

  @spec list_posts(Queries.query_options()) :: [Post.t(), ...]
  def list_posts(opts \\ []) do
    Queries.posts()
    |> Queries.apply_opts(opts)
    |> Repo.all()
  end

  @spec get_post(integer() | String.t(), Queries.query_options()) ::
          {:ok, Post.t()} | {:error, atom()}
  def get_post(id, opts \\ []) when is_integer(id) or is_binary(id) do
    Queries.posts()
    |> Queries.by_id(id)
    |> Queries.apply_opts(opts)
    |> Repo.fetch_one()
  end
end

I have also tried the forms:

Posts.posts()
|> Posts.by_category("x")
|> Posts.with_comments()
|> Posts.list()

and

Posts.list_posts(fn query ->
  query
  |> Post.by_category("x")
  |> Post.with_comments()
end)

but have found the macro to be nicest to use in the end. I actually don’t
really love any of them though.

So to answer the original question, I write my preloaders in their own Queries module (generally) per-model and then write my “queries” directly in the view/controller, but they are abstracted behind the data context.

I kind of wrote that macro out one day, then just got on with using it. I have been meaning to return and polish it up, as I think it could probably be reasonably transportable between projects. It can run into namespace issues with stuff like sorting though, the developer needs to make decisions and be consistent with what “names” they give to query fields.

josefrichter · January 25, 2022, 11:21am

josefrichter:

I just came across interesting variation on how to incorporate preloads into parameters (and combine with various other):

def list_classes(criteria \\ []) when is_list(criteria) do
    query = from(c in Class, order_by: c.title)

    Enum.reduce(criteria, query, fn
      {:paginate, %{page: page, per_page: per_page}}, query ->
        from q in query,
          offset: ^((page - 1) * per_page),
          limit: ^per_page

      {:sort, %{sort_by: sort_by, sort_order: sort_order}}, query ->
        from q in query, order_by: [{^sort_order, ^sort_by}]

      {:preload, associations}, query ->
        from q in query, preload: ^associations
    end)
    |> Repo.all()
  end

Just for completeness, the code comes from Pragmatic Studio and their wonderful LiveView Pro course. I just added the associations bit there. It’s great to rewatch the course as I noticed a couple of details that were beyond my skills the first time around. Pragmatic Studio has the best courses in the history of humankind

krasenyp · January 25, 2022, 12:39pm

Am I the only one having a problem with Repo.preload? It does an extra query when you can use a join in more than 90% of the time.

josefrichter · January 25, 2022, 1:26pm

You’re not the only one I wonder if that’s always the case though? Isn’t ecto smart enough to combine Repo.all() |> Repo.preload(..) into one query at least in some situations?