Dynamic sub-querying based on field in top-level query: looking for a better approach

coen.bakker · February 20, 2023, 6:41pm

I have a function get_by_id/1 that returns a database entry and does so correctly (see below). My approach certainly is naive and resulted in verbose code and probably also relatively bad performance.

  def get_by_id(id) do
    draft = Repo.get(Draft, id)
    case draft.type do
      "regular" ->
        query =
          Draft
          |> join(:inner, [d], t in subquery(RegularDraft), on: t.draft_id == d.id)
          |> select_merge([d, t], %{story: t.story})
          |> preload([:authors, :publications])
        Repo.get(query, id)
      "story_starter" ->
        query =
          Draft
          |> join(:inner, [d], t in subquery(StoryStarterDraft), on: t.draft_id == d.id)
          |> select_merge([d, t], %{story: t.story, starter: t.starter})
          |> preload([:authors, :publications])
        Repo.get(query, id)
    end
  end

As you can see, the function first gets draft from the database and then uses a case ... do to dynamically perform a query that is returned from the function. It’s that dynamic subquery that throws me off. Is this where you would use Dynamic queries — Ecto v3.9.4? I have trouble translating the documentation to my use case. Below is my naive attempt at it (that does not work and throws an error).

  def get_by_id(id) do
    query =
      Draft
      |> join(:inner, [d], t in ^type_query(d.type), on: t.draft_id == d.id)
      |> select_merge([d, t], %{story: t.story})
      |> preload([:authors, :publications])

    Repo.get(query, id)
  end

  def type_query("regular") do
    dynamic([t], subquery(RegularDraft))
  end

  def type_query("story_starter") do
    dynamic([t], subquery(StoryStarterDraft))
  end

JohnnyCurran · February 20, 2023, 6:56pm

It seems like you are switching the module / schema you query based on the draft type. In that case, something like the following could work

module = case draft.type do
  "regular" -> RegularDraft
  "story_starter" -> StoryStarterDraft
end

from(d in Draft,
join: t in subquery(module),
on: t.draft_id == d.id,
where: d.id == ^id,
select: %{story: t.story, starter: t.start}
)
|> Repo.one()
|> Repo.preload([:authors, :publications])

Un-tested, but should be close

coen.bakker · February 20, 2023, 7:15pm

That’s indeed much less verbose. But I reckon the values of fields of a binding in a query are never accessable in that query? So I must, in any case, make a round trip to the database first to get draft.type?

That would make sense to me but I thought that maybe Postgres/Ecto is smarter than I think and can schedule to get the value of a field in some smart way before running the rest of the query.

JohnnyCurran · February 20, 2023, 7:53pm

Oh, duh…I missed that on the first pass. Let me think on it a touch more

JohnnyCurran · February 20, 2023, 8:13pm

One option could be to use a UNION ALL. It’s slightly duplicative but it is a single query. The problem is I don’t see how to use the draft.type binding on the subquery before having retrieved it from the outer query. And we can’t know to query Regular or StoryStarter before we know the draft type.

This makes the assumption that a draft will only ever be one of a regular or story_starter. The Inner join will return zero rows on one of the queries, leaving you with the desired result:

def get_by_id(id)
starter_query =
  from(d in Draft,
    join: t in StoryStarterDraft,
    on: t.draft_id == d.id,
    where: d.id == ^id,
    where: d.type == "story_starter",
    select: %{
      story: t.story,
      starter: t.starter
    }
  )

query = from(d in Draft,
  join: t in RegularDraft,
  on: t.draft_id == d.id,
  where: d.id == ^id,
  where: d.type == "regular",
  select: %{
    story: t.story,
    starter: nil # UNION ALL needs the result sets to be the same width
  },
  union_all: ^starter_query
)

Repo.one(query)
end

coen.bakker · February 20, 2023, 11:04pm

I hadn’t thought of an approach using union_all at all. I’m going to look at it more closely tomorrow.

coen.bakker · February 20, 2023, 11:10pm

I kinda like this approach: allowing passing the type to the function get_by_id/2. Sometimes I might already have the type, so then I can pass it to the function. If not, I can use the get_by_id/1.

  def get_by_id(id) do
    draft = Repo.get(Draft, id)
    get_by_id(id, draft.type)
  end

  def get_by_id(id, type) do
    type_module =
      case type do
        "regular" -> RegularDraft
        "story_starter" -> StoryStarterDraft
      end

    type_fields =
      case type do
        "regular" -> [:story]
        "story_starter" -> [:story, :starter]
      end

    query =
      from d in Draft,
        join: t in subquery(type_module),
        on: t.draft_id == d.id,
        select_merge: map(t, ^type_fields),
        preload: [:authors, :publications]

    Repo.get(query, id)
  end

JohnnyCurran · February 20, 2023, 11:24pm

coen.bakker:

type_module =
      case type do
        "regular" -> RegularDraft
        "story_starter" -> StoryStarterDraft
      end

    type_fields =
      case type do
        "regular" -> [:story]
        "story_starter" -> [:story, :starter]
      end

You could combine these, too:

{type_module, type_fields} =
  case type do
    "regular" -> {RegularDraft, [:story]}
    "story_starter" -> {StoryStarterDraft, [:story, :starter]}
  end

# query

If you sometimes will know the type/id then making an extra (small) query isn’t the end of the world.