How to tackle long query with laterals

I have a rather long reporting query that spans over few tables doing at least 4 lateral joins with subqueries and changes due to the various conditions in, let’s call it “master table”. There is a legacy app written in Go that simply manipulates strings, raw SQL strings. Does anyone have a strategy how to tackle it with ecto?

I tried to divide and conquer, but to make one lateral join work, I need to use fragment anyway like this

defp averages_join(query) do
    join(query, :left_lateral, [r, k], avg in fragment(
    "SELECT
      p.keyword_id,
      ROUND(AVG(COALESCE(position, 100)), 1) AS avg
      FROM positions p
      WHERE ...
      AND ...
      GROUP BY ...
      LIMIT 1", 
    ...))
  end

And there be more of those. that will be used based on some external to the query factors. So I’m thinking, maybe just glueing together raw SQL strings is not so bad after all?

Or maybe I just didn’t found out how to make a lateral join correctly? I could do subqueries instead of laterals, but laterals really simplify whole query.

In our big rails app we often go down to raw SQL queries, because they are simply undoable in ActiveRecord.

How would YOU tackle this problem? Glues strings together, or try somehow to translate it to ecto queries?

1 Like

Hah, sounds similar to the horrors I make. ^.^;

Here is the pattern I use currently (and I’m still unhappy with it, all this could be easily solved if Ecto had named joins…), a small version of it but I have some that go up to 16 tables joined (not my choice!!!).

First

Figure out what needs to be joined on, so for example:

  def query_profile(selected \\ :processed, refine) do
    selected =
      case selected do
        nil -> :processed
        :all -> :all
        :processed -> :processed
        single_field when is_atom(single_field) -> single_field
        list when is_list(list) -> list
      end

    query =
      from s in DB.Banner.SPRIDEN,
      where: is_nil(s.spriden_change_ind)

    # Query sections

    {employee, refine} = Keyword.pop(refine, :employee, false)
    {teacher, refine} = Keyword.pop(refine, :teacher, false)

(I’m really tempted to just take screenshots, the color coding on discourse really makes the code so much more unreadable…)

But first for the dynamically built queries that used String manipulation in the other systems (and if I were to write out all variants would take an exponential amount of code, which I do NOT want to do) I have them take 2 options, a setup of what format to return the data (selected), and a set of options for refining the query (refine).

I case out the selected to only get sensible data out of it (or throw a match exception here, which is great to get me to look at it to add more cases).

I then start the basic ‘empty’ query that this is built from (the central ‘profile’ table in the system I’m having to work with) along with checking a certain field is nil (non-nil here means it is ‘old’ and replaced by a new row that has a nil for that field, no it is not a composite primary key, but thankfully this is one of the few fields that has an index).

Next I need to figure out joins. I pre-think of an order I want them to be (essentially I just add more to the ‘end’ of this list as I need more joins, never earlier, you’ll see why shortly) and pop the options for each join, here I’m just showing employee’s and teacher join tables, but there are more, much much more… >.>

Join the tables

So now I know which tables I need to join, they are in the employee/teacher bindings, if nil then don’t, otherwise do (in some others there are different ways to join too, like sometimes I need inner joins, sometimes left, etc…, but simple here), here is employee:

    # Employee handling

    employee_type =
      case List.wrap(employee) do
        [true] -> ["F", "O", "P"]
        types ->
          Enum.map(types, fn
            :fulltime -> "F"
            :parttime -> "P"
            :student -> "O" # workstudy old term not used here anymore, now `student employee`
          end)
      end

    query =
      if employee do
        join(query, :inner, [s],
          e in DB.Banner.PEBEMPL,
          s.spriden_pidm == e.pebempl_pidm and
          e.pebempl_empl_status == "A" and
          e.pebempl_internal_ft_pt_ind in ^employee_type
        )
      else
        query
      end

Again, ignore the bad colorizing on these discourse forums, it looks a LOT better in my editor, what do you expect from javascript-based highlighting. ^.^;

But first I parse out the acceptable options from employee, it only accepts really just true or a set of atoms and returns their mappings to employee_type. I then build the query, as this is the first join it is really simple, just join as normal, not much special here.

Multi-joins

So now I need to join teachers, except sometimes employee will be joined and sometimes it will not (and the speed difference between them definitely encourages me to minimize joins on this ancient database), so I could do something like test if employee and teacher, then test if not employee and teacher, however you can see how that would quickly get crazy large, especially with 16 joins, the number of possible combinations explodes drastically thus incurring not just a mental overhead but also an OhGodTheAmountOfCode overhead just for a single join. However we can take advantage of a few things, the major being that you only need to specify the positional joins ‘up to’ the last-most one you are using, you can ignore others (so if you have 7 joined and you are doing a where clause on the first and third ones, then you only need to specify the first 3, however you need to know if they are joined still, including the second, since Ecto seems to be positional only). So considering that I built a macro, used like this to join the teacher table:


    # Teacher handling

    query =
      if_with_joinlist [s: true, e: employee, t: teacher, sect: true] do
        %{s: s, t: t, sect: sect} ->
          on = dynamic(&1, s.spriden_pidm == t.sirasgn_pidm) # and t.sirasgn_primary_ind == "Y")
          on =
            case List.wrap(teacher)[:term] do
              nil -> dynamic(&1, ^on and t.sirasgn_term_code == ^to_term_format())
              :all -> on
              %{year: _, month: _, day: _} = date -> dynamic(&0, ^on and t.sirasgn_term_code == ^to_term_format(date))
            end
          name_query =
            from crse in DB.Banner.SCBCRSE,
              select: %{
                _rank: fragment("rank() OVER (PARTITION BY ?, ? ORDER BY ? DESC)",
                  crse.scbcrse_crse_numb, crse.scbcrse_subj_code, crse.scbcrse_eff_term),
                numb: crse.scbcrse_crse_numb,
                code: crse.scbcrse_subj_code,
                # scbcrse_eff_term: crse.scbcrse_eff_term,
                scbcrse_title: crse.scbcrse_title,
              }
          query
          |> join(:inner, &2, t in DB.Banner.SIRASGN, ^on)
          |> join(:inner, &1, sect in DB.Banner.SSBSECT,
            t.sirasgn_crn == sect.ssbsect_crn and
            sect.ssbsect_ssts_code == "A"
          )
          |> join(:left, &0, crse in subquery(name_query),
            # crse.scbcrse_csta_code == "A" and # Not active or not, unsure what this is for...
            crse._rank == 1 and
            crse.numb == sect.ssbsect_crse_numb and
            crse.code == sect.ssbsect_subj_code
          )
      else
        query
      end

It grew to more than what it is now and it’s name is a holdover from its original days, but in general if_with_joinlist/2 takes two arguments, the first is a keyword list, in order, the key is a ‘name’, the value is either ‘nil’ or non-nil. The second argument is the body of the function, a do/else pair (the else is required, I did not want to accidentally forget it, and it is important not to forget it). The do branch takes a case-like set of matchers where the matcher is a literal map. The key’s of this map should match the keys in the original keyword list, and the value is the joinlist mapping to a useable name. Ones not specified here are not calculated in the mappings (only if they exist or not). It also replaces all &0 with a joinlist, and ones like &1 is the joinlist except the last one, and &2 is except the last 2, and so forth.

You can see in this teacher join I build up a dynamic query and I’m actually binding 2 tables, the second one sect is set to true because if teacher exists, it exists (I could put the teacher variable on it too but it doubles the branching count of the generated code, it is good to keep this minimal as Elixir starts compiling really slow with a lot of branches… much much slower than erlang does for the same amount of code), so I use a literal ‘true’ for it as well to have it assume it always exists in the branches, which is fine since it is only used when teacher is used too. I then join to 3 tables (I needed a way to access the sect from the third one, hence why it was in the keyword list).

Final refining

I then do some refining on the primary table (potentially some of these are used in prior joins as well to make the joins faster):



    query =
      case refine do
        [pidm: pidm] when is_integer(pidm) -> where(query, [s], s.spriden_pidm == ^pidm)
        [pidm: pidms] when is_list(pidms) ->  where(query, [s], s.spriden_pidm in ^pidms)
        [cnum: cnum] when is_binary(cnum) ->  where(query, [s], s.spriden_id == ^cnum)
        [cnum: cnums] when is_list(cnums) ->  where(query, [s], s.spriden_id in ^cnums)
        [id: id] when is_binary(id) ->        where(query, [s], s.spriden_id == ^id)
        [id: ids] when is_list(ids) ->        where(query, [s], s.spriden_id in ^ids)
        [] -> query
      end

But otherwise any join specific ‘where’ clauses I combine into the on clause when possible. otherwise I now perform those, using the same if_with_joinlist as before a few more times to do where’s between tables as necessary.

Final selection

The new select_merge in the latest ecto substantially shortened this part, but it is still not ‘short’ by any stretch (still needing the if_with_joinlist at times as you see here):



    query =
      case selected do
        :all -> query
        :processed ->
          group? = if teacher, do: true, else: false
          selected =
            if group? do
              group_by(query, [s], [
                s.spriden_pidm,
                s.spriden_id,
                s.spriden_last_name,
                s.spriden_first_name,
                s.spriden_mi,
                ])
            else
              query
            end
            |> select([s], %{
              pidm: s.spriden_pidm,
              id: s.spriden_id,
              last_name: s.spriden_last_name,
              first_name: s.spriden_first_name,
              mi: s.spriden_mi,
            })
          selected =
            if employee do
              if group? do
                group_by(selected, [s, e], [
                  e.pebempl_coas_code_home,
                  e.pebempl_coas_code_dist,
                  e.pebempl_orgn_code_home,
                  e.pebempl_orgn_code_dist,
                  e.pebempl_ecls_code,
                  e.pebempl_lcat_code,
                  e.pebempl_bcat_code,
                  ])
              else
                selected
              end
              |> select_merge([s, e], %{
                coas_home: e.pebempl_coas_code_home,
                coas_dist: e.pebempl_coas_code_dist,
                organization_home: e.pebempl_orgn_code_home,
                organization_dist: e.pebempl_orgn_code_dist,
                ecls_code: e.pebempl_ecls_code,
                lcat_code: e.pebempl_lcat_code,
                bcat_code: e.pebempl_bcat_code,
              })
            else
              selected
            end
          selected =
            if_with_joinlist [s: true, e: employee, t: teacher, sect: true, crse: true] do
                %{s: s, t: t, sect: sect, crse: crse} ->
                  select_merge(selected, &0, %{
                    courses: fragment("ARRAY_AGG(DISTINCT ARRAY[?,?,?,?])",
                      t.sirasgn_crn,
                      sect.ssbsect_subj_code,
                      sect.ssbsect_crse_numb,
                      fragment("COALESCE(?,?)", sect.ssbsect_crse_title, crse.scbcrse_title)
                    ),
                  })
            else
              selected
            end
          selected
        single_field when is_atom(single_field) -> select(query, [s], field(s, ^single_field))
        list when is_list(list) -> select(query, [s], map(s, ^list))
      end

Should be fairly readable if long. I am so happy for select_merge, makes it a lot more obvious what is going on instead of needing to build up repeated and complex selects, this is SO much shorter than before, I’m so happy for select_merge! ^.^

Now, I just need named joins so I can do something like this instead:

    query =
      from [s: s] in DB.Banner.SPRIDEN, # Give this table the recallable name `s`
      where: is_nil(s.spriden_change_ind)
...
if teacher do
  ...
  |> join(:inner, %{s: s}, [t: t] in DB.Banner.SIRASGN, ^on) # Recall back to `s` here, regardless of its join position, name `t`
  > join(:inner, %{s: s, t: t}, sect in DB.Banner.SSBSECT, ## Recall back `s` and `t` here, and name `sect`, this works even if `e` positionally is between `s` and `t`  :-)
  ...
else
  query
end

So yes, all of this would be SOOOOO much more simple with named joins in ecto, that combined with select_merge would make these horrors a bliss to work with. ^.^

Oh, and here is my if_with_joinlist/2 macro, it is ugly as hell but it is a munged build up of a few different things that I don’t want to touch for fear of breaking my monolithically huge queries… >.>

I will be SOOOO glad to get rid of it should ecto add named joins…

  defp if_with_joinlist_do_body(bound_list, {bindings, body}) do
    bound_list = Enum.map(bound_list, fn name ->
      case bindings[name] do
        nil -> Macro.var(String.to_atom("_#{name}"), __MODULE__)
        binding -> binding
      end
    end)
    Macro.postwalk(body, fn
      {:&, _, [less]} when is_integer(less) ->
        Enum.take(bound_list, length(bound_list) - less)
      ast -> ast
    end)
  end

  defp if_with_joinlist_do(_, _, [], ebody), do: ebody
  defp if_with_joinlist_do(bound_list, [], [body], _ebody) do
    if_with_joinlist_do_body(bound_list, body)
  end
  defp if_with_joinlist_do(prior_bound_list, [{bound, condition} | bounds], body, ebody) do
    bound_list = prior_bound_list ++ [bound]
    if condition === true do
      if_with_joinlist_do(bound_list, bounds, body, ebody)
    else
      {has, both_has0} = Enum.split_with(body, &(Keyword.get(elem(&1, 0), bound, nil) != false))
      {not_has, both_has1} = Enum.split_with(body, &(Keyword.get(elem(&1, 0), bound, nil) === nil))
      both_has = Enum.filter(both_has0, &Enum.member?(both_has1, &1))
      has = has ++ both_has
      not_has = not_has ++ both_has
      has = Enum.filter(body, &Enum.member?(has, &1))
      not_has = Enum.filter(body, &Enum.member?(not_has, &1))
      has = if_with_joinlist_do(bound_list, bounds, has, ebody)
      not_has = if_with_joinlist_do(prior_bound_list, bounds, not_has, ebody)
      quote do
        if unquote(condition) do
          unquote(has)
        else
          unquote(not_has)
        end
      end |> Macro.expand(__ENV__)
    end
  end

  defmacrop if_with_joinlist([_ | _] = bound, [do: body, else: ebody]) do
    body =
      Enum.map(body, fn
        {:->, _, [[{:%{}, _, args}], ast]} -> {args, ast}
        unk -> throw {:UNKNOWN, unk}
      end)
    if_with_joinlist_do([], bound, body, ebody)
    # |> Macro.to_string() |> IO.puts() |> throw
  end

I tried optimizing it with a multi-branched case and more and it just ended up running slower so this is what it ended up with. Again, I’d LOVE to be rid of it!

Wow, my use case is much, much simpler. I think what hurts me most, beside named_joins (god that would help a lot) is the need of writing lateral joins as fragments. And the more I think about it the more I think that if you have to write reporting utility that uses mix of ecto dsl and raw SQL, you might as well just go down the “raw SQL glueing” road :frowning: It’s easier to test, because you just have to test if the produced strings for given output match those expected.

What you did there are little monsters :joy: But I don’t see any other way :frowning:

Lol, they are my own lovecraftian horrors to work around ecto’s limitations that I’ve been asking repeatedly for other ideas on how to do them but no one has given any yet. ^.^;

And yes, I so so so badly want named joins… >.>

Yup, in Go I use the text/template package to do this and I bet EEx would be even nicer.