SQL vs Ecto optimization/translation help

So I have a gnarly query in my open source Phoenix app I was trying to make dramatically more efficient… in its current form it brings in a lot of unnecessary data over the wire, so I managed to get a raw SQL form that only gets the data I need, and tried translating it into Ecto, and got stuck with some errors related to group_by vs select I think… the purpose is to get the data we need, which is stocks that have a “path” to any of the credit_types in the database, grouped by which credit type(s) they have a path to (most only have one path, but some have 2, and in that case we want the stock duplicated and to appear in a column for both credit types)… in it’s current form (in Ecto, before attempted refactor) it works but it pulls every food in the database also, because they all relate to one or more credit types, but we only care about “stocked” foods. Current form goes from credit_type “toward” stocks, the refactor works by going from stocks “toward” credit type, and grouping stocks by credit type…

Here is the raw SQL version:

    select stock.id, food.long_desc, food.manufacturer_name, food_group.foodgroup_desc,
      credit_type.name
      from facilities facility
      inner join stocks stock on stock.facility_id = facility.id
      inner join foods food on food.id = stock.food_id
      inner join credit_type_memberships ctm on ctm.food_group_id = food.food_group_id
      inner join credit_types credit_type on credit_type.id = ctm.credit_type_id
    where facility.id = 1
    group by stock.id, food.long_desc, food.manufacturer_name, food_group.foodgroup_desc,
      credit_type.name

And rather than paste the non-working ecto-translation of it here, I’ll link in context to my best/closest attempt I think I got to, in the branch I was playing with it on… the play function here was accidentally committed ages ago while trying to solve the same problem, before I had a working SQL version of it, just to make it easy to test the query in iex… the stock_by_type function above it is the actual query from the app as currently used. You can probably answer any question from the code in the link there, (and original form of that query before the WIP/play commit above is actually changed here for reference) but also happy to provide context if anything is unclear/harder to find an answer to than you’d like, but you’re otherwise interested in helping with the refactor… Thanks all!

So I see a bunch of people have looked at this but no responses… anyone confused over what I’m asking? Possibly the answer is to just use one or more fragments to translate the pieces, which is a valid answer I’ve already considered, but wouldn’t mind advice on the best pattern/seams to tackle that, vs just making it one gigantic fragment, but welcome any thoughts or questions at all… not urgent, just been bugging me and want to find a way to resolve the question so I can stop scratching at the itch of what to do about it :slight_smile:

More rather it is a non-small SQL statement and it is not formatted in proper markdown code tags so it is very hard to read. ^.^

I’ll fix your post to use proper markdown so it looks better. :slight_smile:

EDIT: And edited, I do not have time to mentally parse and convert the size of that SQL just yet, but if no one else does soonish then poke me with @OvermindDL1 and I’ll take a look when I can. :slight_smile:

Thank you! I thought it didn’t look right, but it looked ok and was a code block in the preview box, but then looked rather… flat and un-codeish on the page, appreciate the pointer though, suspected there might be a better way!

Its interesting that you’re using group_by when you aren’t aggregating anything. Which means the query is basically the same as:

    SELECT DISTINCT stock.id, food.long_desc, food.manufacturer_name, food_group.foodgroup_desc,
      credit_type.name
      FROM facilities facility
      INNER JOIN stocks stock on stock.facility_id = facility.id
      INNER JOIN foods food on food.id = stock.food_id
      INNER JOIN credit_type_memberships ctm on ctm.food_group_id = food.food_group_id
      INNER JOIN credit_types credit_type on credit_type.id = ctm.credit_type_id
    WHERE facility.id = 1;

Which might simplify your Ecto work. Personally, for SQL that has this much complexity, I would probably create a view in the database. That way I can separate DB level optimisations from application processing.

Might not be the solution that you are looking for, but I find that complex queries are sometimes best represented as a function. Assuming that’s Postgres, here is an example of how I wrapped a search results function in Phoenix a while back.

http://www.brightball.com/articles/postgresql-functions-with-elixir-ecto

It’s not updated for more recent versions, but same idea might be a good fit for the “don’t overthink it” approach. I think something similar would translate well with the schema/context approach.

this presentation especially from ~14 minutes mark, will empower your ecto skills a lot (schemaless queries - whole presentation is great!) - also gives you a way to compose these big queries from smaller ones…

1 Like

I am not super experienced at SQL so you may be right, I got some help crafting this SQL query, but the reason for the group_by is that I want to display all the stocks that connect to a particular credit_type together… the current query starts from credit type and credit_type has_many stocks through foods, which is joined through food_groups and food_group_memberships… I guess though I wouldn’t think in terms of aggregations per se, I am kind of aggragating which stocks are reachable from which credit types, and duplicate stocks in each credit type are possible and need to be treated as seperate instances for purposes of this query, if that explains anything… the surrounding context of the Elixir code may make this clearer, but the idea is this is a pure refactor of the function as it exists on master or in the second git sha link in the before version, so that no other code needs to change… Does that help explain/understand?

Will keep that in mind @brightball, though I have some reluctance/resistance to splitting actual application logic across code and database stored functions… I know it can be efficient, and I like Ecto’s philosophy of trusting the database for constraints, which arguably verges into application logic in some cases, but constraints are pretty simple compared to stored procedures… not ruling it out, but I don’t think the increase in complexity is justified at this point.

I understand conceptually (but the domain isn’t immediately familiar to me). My intention was to say that the SQL you wrote and the SQL I wrote should produce identical results. When there are no aggregations on a select with a group by then the behaviour is to select distinct rows. Since you mentioned you think it was the group by that was giving you challenges in Ecto, this alternative form might help you get over the bump…

1 Like

Sounds more like you want to ORDER BY instead of grouping. Grouping is primarily for aggregates only and it does not sound like it is what you want. :slight_smile:

Thanks @outlog! I’ve read the What’s new in Ecto book and recall some stuff on Schemaless queries, but will watch the video (and/or review the book) and see if anything in there helps me understand a different approach.

1 Like

Possibly? I’m unclear how or what ordering would really matter for here, I think of order_by for sorting and though I sort at the end by credit_type name, that part is trivial because there’s generally only 3 credit types…

What I really want, ideally, is to get Ecto to return the CreditType structs with stocks preloaded, which it does now, but in the process it pulls all foods from the DB which is inefficient… it just seems like there’s no good way to preload stocks without also preloading all the joining schemas/models along the way… I do also need the foods from those stocks, but not foods that are “dead ends” and do not lead to any stocks for the particular facility id provided.

Not going to disagree too much. Mainly just wanted to toss it out as an option.

In terms of the app logic separation though, just keep in mind that there are some types of app logic that have been trumpeted by frameworks with the goal of database portability that actually negatively impact your application. The database is the only thing that can actually guarantee database integrity since otherwise you’re prone to race conditions.

Functions are really simple and powerful in Postgres though and the type of structure outlined in that post is a couple of lines of code around a query…that roughly translates to the equivalent of a database view with arguments.

The only thing I’d provide as some parting wisdom is this; at the point that you’re performance tuning your data storage or retrieval, all options should be on the table. I’ve seen cases where people were so committed to the idea of avoiding leveraging their database that they were more willing to add 3rd party tools as a workaround (like Redis/Elasticsearch) than using the tooling already in the database.

I don’t think it’s necessary in your case right now, just always be aware that your database (especially PG) is a really powerful thing. Take advantage. Don’t keep a Tesla Roadster in the garage to get groceries. :slight_smile:

2 Likes

@bglusman I think @kip is right on this one. If you are not doing aggregates (like counting the number of stocked items for a particular type) group by is not needed.

It is hard to say what is wrong without seeing the error messages you get from your query.

A few things to help your troubleshooting:

  • Does the raw SQL work in psql?
  • If it works, try it with and without the group by clause and then you can confirm they do the same thing
  • Does the ecto query work without the group by clause?
  • Strip down the query until it works and then add in more elements to get to where you want to be. For example start with just joining one table, make sure the result make sense and then continue with the next one.

What I can see in the link provided it cannot work as your group by clause only contains column whereas your select contains many. Any non-aggregated field in the select clause must be part of the group by clause.

As a side-note. Other’s say this is complicated SQL but this is as straight-forward as it gets. A simple select with a few joins and a where clause.

1 Like

Entirely agree. Complex SQL is what I have to deal with on this Oracle horror at work where the average SQL query is 2 printed-pages in length, and the longest are well over 10 printed pages… >.>

I do fight Ecto a lot with these sadly, and have to toss out a lot of Ecto when I have to start doing cross-schema joins that Ecto entirely does not support… :frowning:

Agreed. Ecto is getting there but like most “ORMs” they optimise for the trivial case which doesn’t exist in real life :slight_smile:

Unfortunately I’ve spent way too much time trying to translate SQL queries into ecto (or other libraries). I still do it because they do add value, but for lots of queries I just use raw SQL.

1 Like

Yeah, I’ve not had a project yet where I did not have to break out of Ecto some-how… ^.^;

I especially like it as a Static Typing and Translation layer on top of SQL. It catches stupid little typing bugs that I make, which consequently is also the main reason why I love statically typed languages. :slight_smile:

2 Likes

Great suggestion @cmkarlsson, simplify! I know I tried some version of simplifying, but probably not enough. The raw sql query does work, and I just confirmed the group by has no effect other than ordering… will report back when I have more to report, thanks to you and @kip and @OvermindDL1 for thoughts!

1 Like

Parting thought, as I start to try this simplifying… I suspect I can make the query work, but I also beleive the shape of the data will be exactly the opposite what I want, and it may make for some very ugly code manually constructing structs or something to rearrange data how I want… I hoped the group_by clause in Ecto would flip things around to hand me back the CreditType structs I want, with the stocks and foods for them loaded up, but I’m pretty sure that was a delusion on my part because I didn’t understand group by correctly or something… is there anyway, given a bunch of sql data, to control how ecto instantiates the structs out of it? I start this query with from stocks in Stock, and so I’m pretty confident what I get out will be a list of Stock structs, but thats not what I want… I want a list of CreditType, containing the stocks as a relation… I sort of lost focus on that while thinking about how to reproduce this SQL query as Ecto, but without that, I think I have to, in memory in elixir, do all kinds of weird acrobatics with the stocks that come back, shoving them into a map or struct based on which credit type’s they have present… which I can do, but just confirmed there’s not a better way.