How can I limit the number of results returned by Dataloader?

Francesco · August 25, 2020, 1:16pm

Hello!

I’m using Absinthe + Dataloader and I have a particular field that returns a lot of results (1000s). I would like to cap it to 10 results, but I’m not sure how since if I did something like the following then it would limit it to 10 results total, not 10 results per parent field.

  def query(View = view, args) do
    view
    |> limit(10)
    |> tablet_query(args)
  end

One way I can think of is to use a custom resolver with dataloader.load and do the filtering in elixir and not with an ecto query. However this seems redundant, is there any way to go about implementing this limiting logic in an Absinthe query?

Thanks in advance for the help!

benwilson512 · August 25, 2020, 2:29pm

Thanks to heroic efforts by @mbuhot https://github.com/absinthe-graphql/dataloader/pull/93 limit in dataloader actually behaves the way you want! Just make sure to update to 1.0.8 which I just released.

Francesco · August 25, 2020, 2:49pm

Amazing, that’s great to hear! As far as a pattern for implementing this goes, would you do something like the following:

    field :views, list_of(:view) do
      arg(:limit, type: :integer)
      arg(:order, type: :sort_order, default_value: :desc)

      resolve(dataloader(Scribe.Analytics))
    end

then in Scribe.Analytics (the context module) have:

  def query(View, args) do
    view_query(args)
  end

  defp view_query(args) do
    Enum.reduce(args, View, fn
      {:order, order}, query ->
        query |> order_by({^order, :inserted_at})
      {:limit, limit_to}, query ->
         query |> limit(limit_to)
    end)
  end

benwilson512 · August 25, 2020, 2:51pm

I think that should work fine!

Francesco · August 26, 2020, 1:00am

Awesome, thanks

Francesco · August 26, 2020, 1:16am

@benwilson512 Is it possible to leverage the same work to get distinct results per parent and not globally? From a quick test it seems like it is but I’m really not confident when it comes to this. The ideal would be if I could get a distinct set of results, order by the inserted_at date and then limit them. Does that seem doable?

If it is the case then what are the limits for this? It seems to be pretty powerful.

Thanks again for the help, I really appreciate it

benwilson512 · August 26, 2020, 2:11am

Well it’s based on lateral joins, so if you can do it with lateral joins, then yes, otherwise no. Unfortunately that’s about all I can contribute without digging in further. @mbuhot might be able to comment further.

mbuhot · August 26, 2020, 4:35am

The implementation is currently checking for the presence of a limit in the query.

When a limit is applied, the query used to load the association is run in a lateral subquery, which would have the effect of applying the distinct separately to each parent .

If you’d like to send a PR that ensures distinct queries are always applied per-parent, the code that needs updating is: here and here

Francesco · August 28, 2020, 1:22pm

Sorry for the slow response! @mbuhot I’ve just modified the code to ensure the query used to load the association always runs in a lateral join but I don’t really know enough to know if it is correct.

Why did you only apply the lateral join subquery when we detected the presence of limit? Is it safe to run it for every query type?

mbuhot · August 29, 2020, 1:27am

My main concerns were that some users might be using MySQL 5, which doesn’t support lateral sub queries, and that when a limit isn’t in use, it’s more efficient to simply call Repo.preload.

One limitation we’ve found is that if your query contains preloads from a join association, it can’t be executed as a subquery.

Otherwise the semantics should be the same.