Grouping and sorting - column must appear in the GROUP BY issues

Hello, first post here. Sorry I’m not good at explaining things.

I have a table of Buses, each of which is associated with one or more Stops and Routes.
I’m trying to make a searchable list of all stops, but I don’t want to re-write my searching and filtering functions see (list_filtered_buses).

So, in sort_bus_list, I’m trying to return a sorted list of all stops.
My issue with the current implementation is that I get back duplicates when I render the list in my template since each bus is being returned with all its stops.

Here is relevant code in sort_bus_list in buses.ex
#V1 - doesn’t return duplicates, but isn’t returns error:
# ERROR 42803 (grouping_error): column “b1.stop_id” must appear in the GROUP BY clause or be used in an aggregate function
list
|> order_by([b, s, r], {^attrs.sort_dir, field(s, ^attrs.sort_attr)})
|> group_by([b, s, r], b.id)

  #V2 - sortable, but returns duplcate

list
|> order_by([b, s, r], {^attrs.sort_dir, field(s, ^attrs.sort_attr)})
|> group_by([b, s, r], [s.id, b.id])

Here’s sudo code of my template:
for bus in Buses
for stop in bus
display stop
end
end

More Buses.ex code for context:
def list_filtered_buses(params) do
search_term = params[“filter”][“query”]

    from(t in bus, join: a in assoc(t, :stops), join: c in assoc(t, :routes))
    |> search(search_term)
    |> list_buses_by_active(params)
    |> list_buses_by_type(params)
    |> sort_bus_list(params)
    |> preload([:stops, :routes])
  end

  def sort_bus_list(list, params) do
    attrs = process_sorting_metrics(params)

    case params["filter"]["sort_attr"] do
      a when a == "stop_name" or a == "stop_id" ->

      #V2 - sortable, but returns duplicate
      list
        |> order_by([b, s, r], {^attrs.sort_dir, field(s, ^attrs.sort_attr)})
        |> group_by([b, s, r], [s.id, b.id])
 
      "route_name" ->
        list
        |> order_by([b, s, r], {^attrs.sort_dir, field(r, ^attrs.sort_attr)})
        |> group_by([b, s, r], [b.id, r.id])

      _ ->
        list
        |> order_by([b, s, r], {^attrs.sort_dir, field(b, ^attrs.sort_attr)})
        |> group_by([b, s, r], b.id)
    end
  end
1 Like

The error is basically saying that every column in your select clause must also be either:

  1. An aggregate or
  2. Also included in the group by.

The reason is this: you are grouping values. if you have:

select bus_id, route_name group by bus_id

then somehow the database has to work out what to do with the route_name column. Return the first one? Last one? Average? you get the picture…

So you might (as an example only, not at all suggesting valid in your case):

select bus_id, min(route_name) group by bus_id
1 Like

Thanks for the response. I’m new to elixir, so I may be misinterpreting your explanation, but unfortunately I understand why I got the error, I’m just not sure how to avoid it without a significant re-write of the code, since my searching functions require this structure:
from(t in bus, join: a in assoc(t, :stops), join: c in assoc(t, :routes))

Ulitimately that error is neither an Ecto nor an Elixir error but a SQL error.

When learning SQL you don’t start with aggregate queries, it’s considered a more advanced skill and before using them in Ecto it does help considerably to have a good handle on matters in the SQL space.

SELECT
 customer_id,
 SUM (amount)
FROM
 payment
GROUP BY
 customer_id;

The general pattern is that there is an aggregate function (here SUM) and that any column used in the GROUP BY has to appear in the result field list outside of the aggregate function.

So the typical design process is to discover the various (different) SQL queries that are necessary to get you the full range of search options that you require.

Then you decide how to use Ecto to compose the query under the various possible circumstances to arrive at the necessary query for the situation that matches the particular instance of search parameters (an example discussion about composing queries is here).

2 Likes

Unfortunately the rules of SQL aren’t going to conform to your code :slight_smile: Can you put a gist on GitHub with the full query module?

What you are looking to do is ensure that for each column in your select clause is either aggregated or in the group_by clause. You don’t include the select in your snipped above so its hard to advise what you might consider.