I have a graphql query that looks like this (simplified)
I am using dataloader for the associations and for the viewerHasLiked field. Unfortunately, when this query is run, dataloader runs the query for viewerHasLiked twice. Once for the post and its comments, and once for the comments’ replies. I have been trying to figure out a way to merge this all into one query and it ended up bringing me to Absinthe.Middleware.Dataloader. It looks like, if I understand correctly, before the field is to be resolved, the plugin calls Dataloader.run/1 on the loader in the context. Then when the middleware is called, it checks if batches are still pending (if the dataloader has finished running?). If there are none pending, it returns the result from dataloader and if there are still batches, it sets the field’s state to :suspended and adds a middleware to that field to call it again, and that is when it returns the result.
Is it even possible with absinthe to tell it to keep suspending the field until all of the children are resolved so that it can run the query once? Also, it looks like Dataloader.run/1 is run in parallel but returns the actual result and not a task. Does this mean that resolution of everything else stops and waits for dataloader?
As I mentioned in the linked thread, the two hard parts are:
Resolvers are opaque to Absinthe. They’re a stack of functions that contain and execute arbitrary code, Absinthe cannot easily know that partial resolution of suspended fields will be better than complete resolution of suspended fields. Right now Absinthe has an execution flow. What you would need to do is write a custom Resolution phase module that could utilize Dataloader specific information to adjust how resolution works.
Global query optimizers are very hard
Fundamentally, you’d have to write a custom resolution phase that executed in a way that knew the underlying semantics of your Dataloader sources. I’m going to contend that this is going to be challenging to do in a way that performs better than what Dataloader / Resolution does data across queries generically.
More than that though, to be accepted into the actual core, you’d need a mechanism that allowed generic query optimization regardless of what underlying resolution mechanism was chosen. I mean don’t get me wrong, this would be awesome, but doing so is an absolutely massive endeavor.
Fortunately, Absinthe is extremely extensible, so if you want to go down this path you should at least not have any particular difficulty getting the code you want into the execution flow.
Thank you so much for the info. Just a clarifying question. It is possible to suspend a field while its sibling’s children are resolved, right? (i.e absinthe doesn’t require that all fields at one nesting level are resolved before moving on to children?)
The children of a suspended field cannot be resolved. To resolve a field, the parent must have a value. The first argument to every resolver is the value of its parent, and a suspended field does not yet have a value, ergo it can’t supply a value to its children resolvers. Absinthe does a depth first resolution pass. If any field suspends, it moves on.
So in your specific example you could do all of the viewerHasLiked fields IF comments resolved eagerly, and replies also resolved eagerly. However if, on pass number 1, viewerHasLiked uses Dataloader AND comments suspends, then Absinthe can’t go any further. It also has no way of knowing that resolving comments first and THEN replies would result in an overall more efficient path.
This is what I mean about global optimizers being hard. In this specific query document that is the right pass, but in other documents that may not be the right option.
Thanks for the reply, you’ve been extremely helpful. I have a quick follow up question. Why does Dataloader’s Ecto source handle batches differently based on the association and what the association is on? Wouldn’t it work better to just batch by just the id if you don’t need a custom run_batch to avoid fetching the same record multiple times?
The problem is, associations are not simple ID lookups. They can be, but they can also be a lot more complex than that, depending on options like where:. Now, possibly, we could have Dataloader try to normalize the batch keys by constructing a “base query” such that simple assocs and manual ID lookups would result in the same base query.