Ecto Query Patterns - Named joins

OvermindDL1 · September 8, 2017, 7:05pm

I build a lot of very large ecto queries to query a very large database that I have no control over, and consequently I get a lot of very large queries generated where joins can be all sorts of random depending on incoming data, and since Ecto ‘labels’ joins via position only (like via select(query, [a, b, c, d, e, f, g], %{blah: a.blah, ...})) then this becomes really really hairy when most of those joins may or may not exist, thus crashing.

I originally got around that by storing a boolean in a binding for whether a join existed or not, which ends up making for very large case statements like:

case {join_blah, join_bloop, join_bleep} do
  {true, true, true} -> select(query, [blah, bloop, bleep], ...)
  {true, true, false} -> select(query, [blah, bloop], ...)
  {true, false, true} -> select(query, [blah, bleep], ...)
  {false, true, true} -> select(query, [bloop, bleep], ...)
  {true, false, false} -> select(query, [blah], ...)
  {false, true, false} -> select(query, [bloop], ...)
  {false, false, true} -> select(query, [bleep], ...)
end

Even the new select_merge feature does not help since you need to know which things are joined in order to name them.

Now you could work around this by building up a schema-less query, but then that means having to type a lot of things (at least in my cases I have to), which is noisy as heck, and you still have to do a lot of conditional testing in order to build up all the other parts of queries like where's and group_by's and so forth.

You can also work around this by just joining everything and only using and selecting what you need, works fantastically with everything from select_merge's to where's and all, unfortunately (at least in my case) it causes havoc in the SQL planner and even the simple no-join queries take ~15s to run (FDW cannot optimize all calls perfectly…).

Something that would be perfect would be being able to name a join, perhaps via syntax like join(query, [s, e: e] in DB.Employee, on: s.id == e.id) and could be used like:

  query =
    case get_named_joins(query) do
      %{s: _, f: _, e: _} -> join(query, [s: s, f: f, e: e], [g: g] in DB.Globals, on: g.blah == e.blah and g.date >= f.bloop_date)
      %{s: _, e: _} -> join(query, [s: s, e: e], [g: g] in DB.Globals, on: g.blah == e.blah)
      _ -> query
    end

Well ecto does not have this ability at the time (always future hope!), currently I work around it by doing this horror (as a very simplified example):

    query =
      if_with_joinlist [s: true, e: employee, t: teacher] do
        %{s: s, t: t} ->
          on = dynamic(&0, s.spriden_pidm == t.sirasgn_pid and t.sirasgn_primary_ind == "Y")
          on =
            case List.wrap(teacher)[:term] do
              nil -> dynamic(&0, ^on and t.sirasgn_term_code == ^to_term_format())
              :all -> on
              %{} = date -> dynamic(&0, ^on and t.sirasgn_term_code == ^to_term_format(date))
            end
          join(query, :inner, &1, t in DB.SIRASGN, ^on)
      else
        query
      end

Where if_with_joinlist/2 first takes a list of keywords, the first element of each property is the ‘name’ you are giving the join, the second element of the property is a conditional of whether it is joined or not. Higher above in the functions I set something like employee = test_whether_to_join_employee(from_whatever) and so forth, in addition to testing it if I should join and do that (for example at this point I’m not joined to SIRASGN yet, but I still test for it as a simple way to wrap it up). Which this does is take the body, test the map you are testing to and extracting what it needs to determine how you are binding to things and it creates a tree of conditional styles. It replaces any &N where N is an integer from 0 or higher, if you put &0 it will put the entire positional join-list in it’s position, something like &1 will take 1 less than the positional join list (convenient in my case to do testing joins like I am in the above example). The above example gets compiled to this:

if(employee) do
  if(teacher) do
    on = dynamic([s, _e, t], s.spriden_pidm() == t.sirasgn_pid() and t.sirasgn_primary_ind() == "Y")
    on = case(List.wrap(teacher)[:term]) do
      nil ->
        dynamic([s, _e, t], ^on and t.sirasgn_term_code() == ^to_term_format())
      :all ->
        on
      %{} = date ->
        dynamic([s, _e, t], ^on and t.sirasgn_term_code() == ^to_term_format(date))
    end
    join(query, :inner, [s, _e], t in DB.SIRASGN, ^on)
  else
    query
  end
else
  if(teacher) do
    on = dynamic([s, t], s.spriden_pidm() == t.sirasgn_pid() and t.sirasgn_primary_ind() == "Y")
    on = case(List.wrap(teacher)[:term]) do
      nil ->
        dynamic([s, t], ^on and t.sirasgn_term_code() == ^to_term_format())
      :all ->
        on
      %{} = date ->
        dynamic([s, t], ^on and t.sirasgn_term_code() == ^to_term_format(date))
    end
    join(query, :inner, [s], t in DB.SIRASGN, ^on)
  else
    query
  end
end

It optimizes out conditionals that are statically known as true as well (since I tend to have 2 or 3 joins that always exist with like 4 to 12 conditional joins). And yes, it can make huge conditional trees, but I’ve not yet seen a way for ecto to make that any easier via it’s public API.

Does anyone have any ideas to simplify this utter ginormous branching tree? I originally had it generating a single case with a list of N size where N is the number of non-static join conditionals and it ended up being slower than the nested cases (minor bit, but eh), but still this tree can get huge (so did the flat case, it had to make a lot of case heads to handle each combination of positional joins as well so it ended up being about the same resultant code count both ways). I would really love to be able to name a join so I could just pull it straight, it would reduce the conditional case from a combinatorial to a simple linear N count of code generated, but as I currently cannot, anyone have better ideas?

OvermindDL1 · September 12, 2017, 7:57pm

So, no patterns for how to do this better by anyone? Anyone?

archan937 · December 5, 2018, 12:52pm

Just use SqlDust: https://github.com/bettyblocks/sql_dust