Subscription catchup / state delta

So this is a problem that’s come up in both large projects I’ve used Absinthe/GraphQL with, and it’s a little bit surprising to me that I can’t find more people who strike it. But perhaps I’m just bad at googling or I’m missing some obvious angle.

The issue comes down to using GraphQL subscriptions for state updates. Take a simple example: user presence on simple IM server. When a client connects, they want:

a) The initial state (all of their friends and whether they’re online or offline), and
b) Notification of all changes to that state (when a friend comes online or goes offline).

Obviously a) can be achieved with a simple query.

The oblivious way (indeed, the only real-time GraphQL way) to do b) is with a subscription. Simple enough - fire an event when a user’s state changes.

So what’s the problem? It’s the intersection of the two. If the client issues the query first, then the subscription, they might miss an update during that gap and their view of the state will be wrong. Conversely, if you do things the other way around, they might get an update that was sent before their query ran, but with the multiple processes involved in Absinthe’s subscriptions there’s no way to guarantee that that will arrive before the query result (which might allow the client to trivially discard it), even if the authoritative data all comes from a single process. And that’s assuming the query is even sent over the same websocket as the query.

One way to solve this would be to include some kind of ordinal field (e.g. a timestamp) to the query and subscription data so that the client could discard subscription updates with an older ordinal than is attached to the query, and apply newer ones. And indeed that works fine and is what I’ve done to this point. But I really don’t like it - it shifts the burden of data consistency to the client when the server has all the data it needs to do it itself (self-evidently, since it’s the one generating the ordinal values).

The problem is, I’ve struggled for a long time to come up with a better system within the constraints of Absinthe and I haven’t been able to. Some of the approaches I’ve looked at:

  • Add a catchup function to subscriptions (see over in https://elixirforum.com/t/adding-a-subscription-catchup-function-to-absinthe/16363). In spite of being implemented, that never really went anywhere and in retrospect that’s probably a good thing because it doesn’t actually solve the problem above - it only removes the need for an extra query without avoiding the race condition.

  • Add some callback configuration to the subscription definitions which generate and compare ordinals from within the subscription execution. That ends up not working too because the subscription resolution is batched and as a result there’s no good place to shove the last ordinal value for each connection to that subscription which is accessible when it needs to be.

  • Do something similar with middleware and/or plugins - that ends up having the same issue as above: middleware and plugins aren’t executed for each connection.

As such I’ve kind of come to the realisation that the current Absinthe architecture probably can’t support what I want and at a bare minimum I’m going to have to start digging into absinthe_phoenix’s pubsub stuff if I want any chance of making this work.

So I guess my questions are: Am I missing something? Is this whole thing a fool’s errand and I should just let the client deal with it and get on with my life? Does anyone have any really clever insights I’ve missed?

Thanks!

2 Likes

My solution is probably costly; here it goes: Do not embed the state in the notification at all.

For example, if a user comes on line or goes off-line, the same notification would be sent indicating that there is something happening wrt to the user. The subscribers of the notification have to issue a follow-up query to find out the state and compare the result to its own copy of the state.

That would be an option, certainly, but it seems like the worst of all worlds. It’s expensive (as you note) both computationally and in terms of network traffic; it still requires the client to have smarts that it ideally shouldn’t need (albeit slightly different ones), and it potentially produces redundant traffic when a few updates happen in quick succession (in which case the client’s first query may get all the updates at once, but it would still have to issue another query for each update notification to be certain).

For your example user presence it seems fine to me if you just

  1. Subscribe for changes
  2. Query for the initial state
  3. Handle changes (including the ones before the query)

Worst case the user of the IM client sees a quick blip in a friends online indicator or gets a useless notification in the UI.

But i haven’t used Graphql so not sure if theres something i’m missing.
I have a similar use case but i’m using PostgreSQL logical replication which has an initial snapshot.

:thinking: I am still a bit new to GraphQL, but one approach to model this might be to construct a subscription for the union of the ‘InitialState’ and ‘DeltaOfChanges’ types.

Because you then only make a single request (to establish the connection for the subscription), the issue of the race condition would be solved.

You’re quite right that that would in fact work for the example I gave. In retrospect, I should have given something slightly more complex, so replace it with a chat channel where you need:

a) The initial state (all messages up to this point), and
b) Notification of all changes to that state (when a new chat message is sent).

In this case you have to have the client de-duplicating chat messages that might have arrived on the subscription and also be in the state, and the Absinthe architecture doesn’t really make it easy to fix that. Again, in principle it’s easy enough to do on the client, but it feels like it shouldn’t be required.

That’s a neat idea (and I’ll probably use it at some point) but it doesn’t address the issue, which is that in Absinthe the subscription updates are handled asynchronously, so you can’t guarantee ordering/consistency between them and the initial state.

It would also require the un-merged “catchup” feature I linked earlier, but that, at least, is more or less a solved problem.

Thanks for your reply; you’re right, subscriptions are handled fully asynchronously which makes this more difficult. Essentially, we’d want to execute code whenever a new subscription is created, to send some data to only that subscription (rather than ‘everyone subscribed to this topic’ as is usually the case with Absynthe’s subscriptions).
Doing this kind of thing is very simple when e.g. using Phoenix Channels directly, but I’m not sure whether it is currently possible to access these internals from outside of Absinthe’s wrapper of them.

I wonder if @benwilson512 has some ideas on how to approach this situation.

If you really want to embed state delta in the notifications, then you should carefully design the delta format so they are idempotent.

You are describing a presentational issue here imo. If you include timestamps just let the client figure out the ordering and what to do?

This sort of thing is ideally implemented within Absinthe itself so that it isn’t directly dependent on the channels impl, but it’s a bit non trivial to sort out when to trigger the update. The challenge is that when you run a subscription, what is returned is a topic. The client then needs to subscribe to the topic, and this act of subscribing is just a call into the pubsub behavior, it isn’t something that pings any Absinthe code. That is to say, Absinthe has no way of knowing today when the end client actually subscribes.

Any proposed solution here would basically require then three things:

  1. An Absinthe.Subscription.prime/n function that takes a client topic (as distinct from the general shared topic returned from config) that you want to prime.

  2. Some way of figuring out what that root_value is. This could probably be a function returned from config on the subscription called maybe prime:. Eg: prime: fn -> Repo.get(Shipment, args.shipment_id) end).

  3. Code in Absinthe.Phoenix to call prime.

A PR containing these two ideas would be welcome.

Arguably, yes, but as I said it’s an issue I’d like to solve on the server since there’s no good reason to offload that work to the (possibly multiple different) clients when there’s no fundamental reason that the server can’t do it.

Brilliant, thanks Ben. I came to a roughly similar conclusion myself, I think, and I’m in the process of putting together some code for it. I’ll definitely look to this as guidance though.

1 Like

Attempted solution at Subscription primeing/ordinals by bernardd · Pull Request #1168 · absinthe-graphql/absinthe · GitHub and Subscription prime by bernardd · Pull Request #93 · absinthe-graphql/absinthe_phoenix · GitHub