How to address the issue with limits on preloaded associations using Ecto?

Breaking it down:

artists_limit = 2
albums_limit = 2
latest_artists_query = from(ar in Artist, [
  order_by: [desc: ar.inserted_at],
  limit: ^artists_limit
])

This query simply selects the %MusicDB.Artist{} of the most recently inserted artists. The intent is to use it for a subquery in

latest_albums_query = from(q in subquery(latest_artists_query), [
  left_lateral_join: lal in fragment("SELECT id FROM albums AS al WHERE al.artist_id = ? ORDER BY al.inserted_at DESC NULLS LAST LIMIT ?", q.id, ^albums_limit), on: true,
  join: al in Album, on: lal.id == al.id,
  select: al
])

This part of the query contains the lateral join to identify the two most recently added albums to the artists specified by the subquery. Because of the fragment this information can’t be correlated to the schemas known to Ecto.

  join: al in Album, on: lal.id == al.id,
  select: al

This join gets around that issue by joining against the albums table again - this time in a fashion that lets Ecto use %MusicDB.Album{} and because we want to use this query as a subquery we have to use an Ecto struct (list, maps, tuple aren’t supported for subqueries) which is accomplished with select: al (without it, only the %MusicDB.Artist{} would be returned). This means that in this pass we lose %MusicDB.Artist{} but as %MusicDB.Album{} has artist_id (and more importantly belongs_to(:artist, Artist)) we can get that back later.

query = from(p in subquery(latest_albums_query), [
  preload: [:tracks, :artist],
  order_by: [asc: :artist_id, asc: :inserted_at]
])

Now given that we have all the albums we want, we also want the child tracks and the parent artist. This query does this via Ecto’s Ecto.Query.preload/3 to load them as associations (strictly an Ecto feature, not SQL) with preload: [:tracks, :artist]. order_by: [asc: :artist_id, asc: :inserted_at] keeps all the %MusicDB.Album{} for the same artists “together” while ordering them in ascending order. This ordering will be reversed (i.e. descending) during “post load processing” which is the final ordering we are looking for.

Using the query now, Ecto will generate 3 separate SQL queries:

  1. To obtain the desired %MusicDB.Album{} values
  2. To obtain the necessary %MusicDB.Artist{} preload values
  3. To obtain the necessary %MusicDB.Track{} preload values

However the resulting shape of the data is like this:

[
  %MusicDB.Album{
    artist_id: 2,
    id: 3,
    artist: %MusicDB.Artist{id: 2, ...},
    tracks: [%MusicDB.Track{id: 11, ...}, ... %MusicDB.Track{id: 20, ...}]
    ...
  },
  %MusicDB.Album{
    artist_id: 2,
    id: 4,
    artist: %MusicDB.Artist{id: 2, ...},
    tracks: [%MusicDB.Track{id: 21, ...}, ... %MusicDB.Track{id: 29, ...}]
    ...
  },
  %MusicDB.Album{
    artist_id: 3,
    id: 5,
    artist: %MusicDB.Artist{id: 3, ...},
    tracks: [%MusicDB.Track{id: 30, ...}, ... %MusicDB.Track{id: 33, ...}]
    ...
  }
]

but the shape we are looking for is

[
  %MusicDB.Artist{
    id: 3,
    albums: [
      %MusicDB.Album{
        id: 5,
        artist_id: 3,
        tracks: [%MusicDB.Track{id: 30, ...}, ... %MusicDB.Track{id: 33, ...}],
        ...
      }
    ],
    ...
  },
  %MusicDB.Artist{
    id: 2,
    albums: [
      %MusicDB.Album{
        id: 4,
        artist_id: 2,
        tracks: [%MusicDB.Track{id: 21, ...}, ... %MusicDB.Track{id: 29, ...}],
        ...
      },
      %MusicDB.Album{
        id: 3,
        artist_id: 2,
        tracks: [%MusicDB.Track{id: 11, ...}, ... %MusicDB.Track{id: 20, ...}],
        ...
      }
    ],
    ...
  }
]

So some data transformation is still required

forget_assoc = fn ecto_struct, field, cardinality ->
  not_loaded = %Ecto.Association.NotLoaded{
    __field__: field,
    __owner__: ecto_struct.__struct__,
    __cardinality__: cardinality
  }
  Map.put(ecto_struct, field, not_loaded)
end
forget_artist = &(forget_assoc.(&1, :artist, :one))

forget_artist/1 is used to “unload” the artist association from %MusicDB.Album{} when we are done with it.

make_result = fn rows ->
  rows
  |> List.foldl({nil, nil, []}, row_to_acc)
  |> compact_to_rest.() # NOTE: could reverse artists after this
end

make_result/1 processes the loaded rows to produce the desired data shape. The accumulating data structure is {artist, [album], [artist_with_albums]} where artist is the %MusicDB.Artist{} currently being worked on, [album] is the list of %MusicDB.Albums{} for the current artist, and [artist_with_albums] is the list of already processed %MusicDB.Artist{} structs. The final compact_to_rest/1 is necessary to move the last artist and its [album] into [artist_with_albums].

compact_to_rest = fn
  {nil, nil, rest} ->
    rest
  {artist, albums, rest} ->
    [Map.put(artist, :albums, albums)|rest] # NOTE: could reverse albums here
end
row_to_acc = fn
  (%MusicDB.Album{artist: %MusicDB.Artist{id: id}} = a, {%MusicDB.Artist{id: id} = artist, albums, rest}) ->
    {artist, [forget_artist.(a)|albums], rest}
  (album, acc) ->
    {album.artist, [forget_artist.(album)], compact_to_rest.(acc)}
end

Support functions for make_result/1:

compact_to_rest1:

  • simply moves artist and its [album] into [artist_with_albums] and returns the new resulting [artist_with_albums]. OR
  • returns the current [artist_with_albums] if there is no artist (and [album]).

row_to_acc/2:

  • Adds the provided MusicDB.Album{} to [album] if its artist matches the current artist (after “unloading” the struct’s artist association). OR
  • “Compacts” the current artist and [album] into [album_with_artists], before setting the new “current” artist/[album] to album.artist/[forget_artist.(album)].

Note: this approach will lose artists without any albums. Those could be recovered with:

no_albums_artists_query = from(q in latest_artists_query, [
  left_join: al in Album, on: q.id == al.artist_id,
  where: is_nil(al.id)
])
no_albums_artists = Repo.all(no_albums_artists_query)

PS: Conceptually I’m still not entirely sold on whether it actually makes any sense to put anything but the complete set of associations inside an Ecto association given that it is an Ecto concept rather than something that emerges naturally from SQL.

While there may be practical reasons for wanting only some constrained subset of the complete set of associations:

  • associations seem to be part of Ecto’s mapping domain expressing the idea that this data is unconditionally associated with all this other data (the foreign key link either exists or it doesn’t - there is no gray area).
  • creating constrained subsets of anything is entirely within SQL’s domain.

So whenever something like LIMIT comes into play it may make more sense to avoid Ecto associations altogether and simply stick to Ecto(.Query)'s plain query functionality and then massage the loaded rows into the required shape via data transformations.

16 Likes