Limit the number of preloaded items

roperzh · July 26, 2017, 5:43pm

Hello,

Having a relationship like this:

defmodule Cmr.Shelf.Book do
  # ...
  schema "shelf_books" do
    # ...
    has_many :keywords, Keyword
  end
end

defmodule Cmr.Shelf.Keyword do
  # ...
  schema "shelf_keywords" do
    # ...
    field :occurrences, :integer
    belongs_to :book, Book
  end
end

I want to build a query that brings a list of Shelf.Book's. Since a book can have hundreds of keywords, I want to optimize the amount of data that I send to the client. The idea is to improve my query to only bring the n (let’s say 15) keywords with more occurrences.

So far I tried to do something in the lines of:

    # Define a query for keywords, limit the number to 15
    k_query = from k in Keyword,
      limit: 15,
      order_by: [desc: k.occurrences]

    # Query books, but using k_query to preload associations
    Repo.all from b in Book,
      limit: 10,
      preload: [keywords: ^k_query]

But this doesn’t work properly, and I suspect that it’s because the limit is being applied to the total number of keywords.

For example, this is the output of the query in JSON format, the first item in the list is coming with 15 keywords, while all the following ones have an empty array. I can confirm that all the books have associated keywords.

{
  "data": [{
    "title": "Pride and Prejudice",
    "keywords": [
      // 15 items here
    ],
    "id": 1,
    "gutenberg_id": 1342,
    "author": "Austen, Jane"
  }, {
    "title": "Alice's Adventures in Wonderland",
    "keywords": [
      // No items here
    ],
    "id": 2,
    "gutenberg_id": 11,
    "author": "Carroll, Lewis"
  }]
}

Any hint/pointer will be appreciated! Thanks!

Ankhers · July 26, 2017, 6:38pm

Unfortunately this is not something you can currently do. Take a look at the documentation for Ecto.Query.preload/3. Specifically:

Note: keep in mind operations like limit and offset in the preload query will affect the whole result set and not each association. For example, the query below:

comments_query = from c in Comment, order_by: c.popularity, limit: 5
Repo.all from p in Post, preload: [comments: ^comments_query]

> won’t bring the top of comments per post. Rather, it will only bring the 5 top comments across all posts.

There was also an [issue](https://github.com/elixir-ecto/ecto/issues/1956) created for this.

roperzh · July 26, 2017, 6:44pm

Woot! thanks so much @Ankhers :). I will try to find a workaround for the moment then.

axelson · July 26, 2017, 8:27pm

Although if your requirements were to only return the top keyword for each book that is accomplishable via preload by utilizing distinct.

Details are in this post: Ecto and preloading most recent record from association