I have a rather long reporting query that spans over few tables doing at least 4 lateral joins with subqueries and changes due to the various conditions in, let’s call it “master table”. There is a legacy app written in Go that simply manipulates strings, raw SQL strings. Does anyone have a strategy how to tackle it with ecto?
I tried to divide and conquer, but to make one lateral join work, I need to use fragment anyway like this
defp averages_join(query) do
join(query, :left_lateral, [r, k], avg in fragment(
"SELECT
p.keyword_id,
ROUND(AVG(COALESCE(position, 100)), 1) AS avg
FROM positions p
WHERE ...
AND ...
GROUP BY ...
LIMIT 1",
...))
end
And there be more of those. that will be used based on some external to the query factors. So I’m thinking, maybe just glueing together raw SQL strings is not so bad after all?
Or maybe I just didn’t found out how to make a lateral join correctly? I could do subqueries instead of laterals, but laterals really simplify whole query.
In our big rails app we often go down to raw SQL queries, because they are simply undoable in ActiveRecord.
How would YOU tackle this problem? Glues strings together, or try somehow to translate it to ecto queries?
Here is the pattern I use currently (and I’m still unhappy with it, all this could be easily solved if Ecto had named joins…), a small version of it but I have some that go up to 16 tables joined (not my choice!!!).
First
Figure out what needs to be joined on, so for example:
def query_profile(selected \\ :processed, refine) do
selected =
case selected do
nil -> :processed
:all -> :all
:processed -> :processed
single_field when is_atom(single_field) -> single_field
list when is_list(list) -> list
end
query =
from s in DB.Banner.SPRIDEN,
where: is_nil(s.spriden_change_ind)
# Query sections
{employee, refine} = Keyword.pop(refine, :employee, false)
{teacher, refine} = Keyword.pop(refine, :teacher, false)
(I’m really tempted to just take screenshots, the color coding on discourse really makes the code so much more unreadable…)
But first for the dynamically built queries that used String manipulation in the other systems (and if I were to write out all variants would take an exponential amount of code, which I do NOT want to do) I have them take 2 options, a setup of what format to return the data (selected), and a set of options for refining the query (refine).
I case out the selected to only get sensible data out of it (or throw a match exception here, which is great to get me to look at it to add more cases).
I then start the basic ‘empty’ query that this is built from (the central ‘profile’ table in the system I’m having to work with) along with checking a certain field is nil (non-nil here means it is ‘old’ and replaced by a new row that has a nil for that field, no it is not a composite primary key, but thankfully this is one of the few fields that has an index).
Next I need to figure out joins. I pre-think of an order I want them to be (essentially I just add more to the ‘end’ of this list as I need more joins, never earlier, you’ll see why shortly) and pop the options for each join, here I’m just showing employee’s and teacher join tables, but there are more, much much more… >.>
Join the tables
So now I know which tables I need to join, they are in the employee/teacher bindings, if nil then don’t, otherwise do (in some others there are different ways to join too, like sometimes I need inner joins, sometimes left, etc…, but simple here), here is employee:
# Employee handling
employee_type =
case List.wrap(employee) do
[true] -> ["F", "O", "P"]
types ->
Enum.map(types, fn
:fulltime -> "F"
:parttime -> "P"
:student -> "O" # workstudy old term not used here anymore, now `student employee`
end)
end
query =
if employee do
join(query, :inner, [s],
e in DB.Banner.PEBEMPL,
s.spriden_pidm == e.pebempl_pidm and
e.pebempl_empl_status == "A" and
e.pebempl_internal_ft_pt_ind in ^employee_type
)
else
query
end
Again, ignore the bad colorizing on these discourse forums, it looks a LOT better in my editor, what do you expect from javascript-based highlighting. ^.^;
But first I parse out the acceptable options from employee, it only accepts really just true or a set of atoms and returns their mappings to employee_type. I then build the query, as this is the first join it is really simple, just join as normal, not much special here.
Multi-joins
So now I need to join teachers, except sometimes employee will be joined and sometimes it will not (and the speed difference between them definitely encourages me to minimize joins on this ancient database), so I could do something like test if employee and teacher, then test if not employee and teacher, however you can see how that would quickly get crazy large, especially with 16 joins, the number of possible combinations explodes drastically thus incurring not just a mental overhead but also an OhGodTheAmountOfCode overhead just for a single join. However we can take advantage of a few things, the major being that you only need to specify the positional joins ‘up to’ the last-most one you are using, you can ignore others (so if you have 7 joined and you are doing a where clause on the first and third ones, then you only need to specify the first 3, however you need to know if they are joined still, including the second, since Ecto seems to be positional only). So considering that I built a macro, used like this to join the teacher table:
# Teacher handling
query =
if_with_joinlist [s: true, e: employee, t: teacher, sect: true] do
%{s: s, t: t, sect: sect} ->
on = dynamic(&1, s.spriden_pidm == t.sirasgn_pidm) # and t.sirasgn_primary_ind == "Y")
on =
case List.wrap(teacher)[:term] do
nil -> dynamic(&1, ^on and t.sirasgn_term_code == ^to_term_format())
:all -> on
%{year: _, month: _, day: _} = date -> dynamic(&0, ^on and t.sirasgn_term_code == ^to_term_format(date))
end
name_query =
from crse in DB.Banner.SCBCRSE,
select: %{
_rank: fragment("rank() OVER (PARTITION BY ?, ? ORDER BY ? DESC)",
crse.scbcrse_crse_numb, crse.scbcrse_subj_code, crse.scbcrse_eff_term),
numb: crse.scbcrse_crse_numb,
code: crse.scbcrse_subj_code,
# scbcrse_eff_term: crse.scbcrse_eff_term,
scbcrse_title: crse.scbcrse_title,
}
query
|> join(:inner, &2, t in DB.Banner.SIRASGN, ^on)
|> join(:inner, &1, sect in DB.Banner.SSBSECT,
t.sirasgn_crn == sect.ssbsect_crn and
sect.ssbsect_ssts_code == "A"
)
|> join(:left, &0, crse in subquery(name_query),
# crse.scbcrse_csta_code == "A" and # Not active or not, unsure what this is for...
crse._rank == 1 and
crse.numb == sect.ssbsect_crse_numb and
crse.code == sect.ssbsect_subj_code
)
else
query
end
It grew to more than what it is now and it’s name is a holdover from its original days, but in general if_with_joinlist/2 takes two arguments, the first is a keyword list, in order, the key is a ‘name’, the value is either ‘nil’ or non-nil. The second argument is the body of the function, a do/else pair (the else is required, I did not want to accidentally forget it, and it is important not to forget it). The do branch takes a case-like set of matchers where the matcher is a literal map. The key’s of this map should match the keys in the original keyword list, and the value is the joinlist mapping to a useable name. Ones not specified here are not calculated in the mappings (only if they exist or not). It also replaces all &0 with a joinlist, and ones like &1 is the joinlist except the last one, and &2 is except the last 2, and so forth.
You can see in this teacher join I build up a dynamic query and I’m actually binding 2 tables, the second one sect is set to true because if teacher exists, it exists (I could put the teacher variable on it too but it doubles the branching count of the generated code, it is good to keep this minimal as Elixir starts compiling really slow with a lot of branches… much much slower than erlang does for the same amount of code), so I use a literal ‘true’ for it as well to have it assume it always exists in the branches, which is fine since it is only used when teacher is used too. I then join to 3 tables (I needed a way to access the sect from the third one, hence why it was in the keyword list).
Final refining
I then do some refining on the primary table (potentially some of these are used in prior joins as well to make the joins faster):
query =
case refine do
[pidm: pidm] when is_integer(pidm) -> where(query, [s], s.spriden_pidm == ^pidm)
[pidm: pidms] when is_list(pidms) -> where(query, [s], s.spriden_pidm in ^pidms)
[cnum: cnum] when is_binary(cnum) -> where(query, [s], s.spriden_id == ^cnum)
[cnum: cnums] when is_list(cnums) -> where(query, [s], s.spriden_id in ^cnums)
[id: id] when is_binary(id) -> where(query, [s], s.spriden_id == ^id)
[id: ids] when is_list(ids) -> where(query, [s], s.spriden_id in ^ids)
[] -> query
end
But otherwise any join specific ‘where’ clauses I combine into the on clause when possible. otherwise I now perform those, using the same if_with_joinlist as before a few more times to do where’s between tables as necessary.
Final selection
The new select_merge in the latest ecto substantially shortened this part, but it is still not ‘short’ by any stretch (still needing the if_with_joinlist at times as you see here):
query =
case selected do
:all -> query
:processed ->
group? = if teacher, do: true, else: false
selected =
if group? do
group_by(query, [s], [
s.spriden_pidm,
s.spriden_id,
s.spriden_last_name,
s.spriden_first_name,
s.spriden_mi,
])
else
query
end
|> select([s], %{
pidm: s.spriden_pidm,
id: s.spriden_id,
last_name: s.spriden_last_name,
first_name: s.spriden_first_name,
mi: s.spriden_mi,
})
selected =
if employee do
if group? do
group_by(selected, [s, e], [
e.pebempl_coas_code_home,
e.pebempl_coas_code_dist,
e.pebempl_orgn_code_home,
e.pebempl_orgn_code_dist,
e.pebempl_ecls_code,
e.pebempl_lcat_code,
e.pebempl_bcat_code,
])
else
selected
end
|> select_merge([s, e], %{
coas_home: e.pebempl_coas_code_home,
coas_dist: e.pebempl_coas_code_dist,
organization_home: e.pebempl_orgn_code_home,
organization_dist: e.pebempl_orgn_code_dist,
ecls_code: e.pebempl_ecls_code,
lcat_code: e.pebempl_lcat_code,
bcat_code: e.pebempl_bcat_code,
})
else
selected
end
selected =
if_with_joinlist [s: true, e: employee, t: teacher, sect: true, crse: true] do
%{s: s, t: t, sect: sect, crse: crse} ->
select_merge(selected, &0, %{
courses: fragment("ARRAY_AGG(DISTINCT ARRAY[?,?,?,?])",
t.sirasgn_crn,
sect.ssbsect_subj_code,
sect.ssbsect_crse_numb,
fragment("COALESCE(?,?)", sect.ssbsect_crse_title, crse.scbcrse_title)
),
})
else
selected
end
selected
single_field when is_atom(single_field) -> select(query, [s], field(s, ^single_field))
list when is_list(list) -> select(query, [s], map(s, ^list))
end
Should be fairly readable if long. I am so happy for select_merge, makes it a lot more obvious what is going on instead of needing to build up repeated and complex selects, this is SO much shorter than before, I’m so happy for select_merge! ^.^
Now, I just need named joins so I can do something like this instead:
query =
from [s: s] in DB.Banner.SPRIDEN, # Give this table the recallable name `s`
where: is_nil(s.spriden_change_ind)
...
if teacher do
...
|> join(:inner, %{s: s}, [t: t] in DB.Banner.SIRASGN, ^on) # Recall back to `s` here, regardless of its join position, name `t`
> join(:inner, %{s: s, t: t}, sect in DB.Banner.SSBSECT, ## Recall back `s` and `t` here, and name `sect`, this works even if `e` positionally is between `s` and `t` :-)
...
else
query
end
So yes, all of this would be SOOOOO much more simple with named joins in ecto, that combined with select_merge would make these horrors a bliss to work with. ^.^
Oh, and here is my if_with_joinlist/2 macro, it is ugly as hell but it is a munged build up of a few different things that I don’t want to touch for fear of breaking my monolithically huge queries… >.>
I will be SOOOO glad to get rid of it should ecto add named joins…
defp if_with_joinlist_do_body(bound_list, {bindings, body}) do
bound_list = Enum.map(bound_list, fn name ->
case bindings[name] do
nil -> Macro.var(String.to_atom("_#{name}"), __MODULE__)
binding -> binding
end
end)
Macro.postwalk(body, fn
{:&, _, [less]} when is_integer(less) ->
Enum.take(bound_list, length(bound_list) - less)
ast -> ast
end)
end
defp if_with_joinlist_do(_, _, [], ebody), do: ebody
defp if_with_joinlist_do(bound_list, [], [body], _ebody) do
if_with_joinlist_do_body(bound_list, body)
end
defp if_with_joinlist_do(prior_bound_list, [{bound, condition} | bounds], body, ebody) do
bound_list = prior_bound_list ++ [bound]
if condition === true do
if_with_joinlist_do(bound_list, bounds, body, ebody)
else
{has, both_has0} = Enum.split_with(body, &(Keyword.get(elem(&1, 0), bound, nil) != false))
{not_has, both_has1} = Enum.split_with(body, &(Keyword.get(elem(&1, 0), bound, nil) === nil))
both_has = Enum.filter(both_has0, &Enum.member?(both_has1, &1))
has = has ++ both_has
not_has = not_has ++ both_has
has = Enum.filter(body, &Enum.member?(has, &1))
not_has = Enum.filter(body, &Enum.member?(not_has, &1))
has = if_with_joinlist_do(bound_list, bounds, has, ebody)
not_has = if_with_joinlist_do(prior_bound_list, bounds, not_has, ebody)
quote do
if unquote(condition) do
unquote(has)
else
unquote(not_has)
end
end |> Macro.expand(__ENV__)
end
end
defmacrop if_with_joinlist([_ | _] = bound, [do: body, else: ebody]) do
body =
Enum.map(body, fn
{:->, _, [[{:%{}, _, args}], ast]} -> {args, ast}
unk -> throw {:UNKNOWN, unk}
end)
if_with_joinlist_do([], bound, body, ebody)
# |> Macro.to_string() |> IO.puts() |> throw
end
I tried optimizing it with a multi-branched case and more and it just ended up running slower so this is what it ended up with. Again, I’d LOVE to be rid of it!
Wow, my use case is much, much simpler. I think what hurts me most, beside named_joins (god that would help a lot) is the need of writing lateral joins as fragments. And the more I think about it the more I think that if you have to write reporting utility that uses mix of ecto dsl and raw SQL, you might as well just go down the “raw SQL glueing” road It’s easier to test, because you just have to test if the produced strings for given output match those expected.
What you did there are little monsters But I don’t see any other way
Lol, they are my own lovecraftian horrors to work around ecto’s limitations that I’ve been asking repeatedly for other ideas on how to do them but no one has given any yet. ^.^;