Ecto preloading with subquery for pagination with limit

I’m trying to replicate the following PostgreSQL query with ecto:

SELECT *
FROM (SELECT * FROM event LIMIT 1000) as e
LEFT JOIN event_runner as er ON e.id = er.event_id

The idea is to limit only the FROM part of the query, returning the correct amount of events stopping the results from the JOINs interfering with the limit count, as it would occur if I used a LIMIT clause in the main query, as there is multiple event_runners for one event, defined as a has_many in the Event schema.

That said I trying to use the current ecto query:

event_query = from e in Event, limit: ^max_rows

from f in subquery(event_query),
  left_join: er in assoc(f, :event_runners),
  preload: [event_runners: er]

However when I run the above query I get the following error:

** (Ecto.QueryError) can only preload sources with a schema (fragments, binary and subqueries are not supported) in query:

     from e0 in subquery(from f0 in MyApp.Event,
       limit: 100,
       select: f0),
       join: e1 in MyApp.EventRunner,
       on: e1.event_id == e0.id,
       select: e0,
       preload: [event_runners: er]

The error is clear, I can’t do a preload with subquery as a source schema, but the use of preload from the join part would be ideal, as we are receiving a lot of request and for each entry returned a separated query is executed if we use the Repo.preload function.

Is there any way to get around this limitation? Or, is it possible to rewrite this piece of code to make the limit only affect the FROM clause?

Postgres is smart enough not to join the entire tables before taking the limit so you can get rid of the subquery and just put the limit on the outside. At least it works on my version of postgres. Try out these 2 queries in a safe environment and compare how long they take to run

SELECT * from event e LEFT JOIN event_runner er on e.id = er.event_id LIMIT 1

SELECT * from event e LEFT JOIN event_runner er on e.id = er.event_id

If limit is happening after the entire tables are joined then they should take the same time. In my testing the first is happening instantaneously while the second takes a long time.

Hey, thanks for the response, which version of postgres are you using?

I am using Postgres 13.

But now I realized what I said only works if there is at most on event runner per event. Otherwise you’ll get 1000 results but not 1000 events if one event has more than one runner. What is the cardinality of the relationship ?

Should have stated the cardinality in the post, the Event schema has a has_many relationship to the EventRunner. One event can have 0…n event_runners.

Ah then you might need to use the separate preload query. I know it’s not what you want but it’s supposed to be more efficient anyways for one to many relationships. Otherwise your main table will have duplicated rows sent from the DB to Elixir.

Or if you’re deadset against a separate preload query you would have to do some kind of homegrown solution like this

from e in subquery(....), left_join: er in EventRunner, on e.id == er.event_id, select {e, er}

Then perform the aggregation yourself on the results

Correct me if I’m wrong, but using the Repo.preload(events, :event_runner) in a list of 1000 Events, wouldn’t the preload run 1000 queries to retrieve all EventRunners data? Looking at the application logs with level: :debug, it seems that this is the case.

It would only run one query, something like select * from event_runner where event_id in (....).

That is true, I was inspecting the wrong logs, thanks for the help!