Advice: Use Genstage (but consume events in order)

I have an app that receives many web-hook requests to a point where I’m considering some sort of buffer layer because the database is going to become a bottleneck. Each web-hook request contains a unique user id. The catch is, the web hook requests need to processed in order for a given user. My concern is if I were to setup a producer with many consumers, then there’s no guarantee that events could be consumed/processed in order.

As an example: 3 web-hook requests arrive:

  1. request for user A
  2. request for user B
  3. request for user A

I want to consume these events, but events #1 and #3 must be executed in the order they arrived (1, then 3).

Question: Is my use case a good one for Genstage?

If it helps, an example of a request (JSON content) would be:

{
   id: .. # user's id
   timestamp: ...,
   data: ...,
}

My initial thought was to have some sort of guarantee that a given “id” (user id) would always map to the same consumer, but that seems like maybe not what Genstage is for.

Any other suggestions, rather than Genstage, then I’d be happy to hear them. I just wanted to start with an Elixir based solution before venturing elsewhere (like RabbitMQ or Kafka or ???).

2 Likes

Partitioning maybe? It’s definitely there in Broadway, which is built on GenStage.

https://hexdocs.pm/broadway/Broadway.html#module-ordering-and-partitioning

https://hexdocs.pm/gen_stage/GenStage.PartitionDispatcher.html

1 Like

The docs of GenStage are quite clear that if you need to enforce ordering you need to built that in on your own. There’s things in GenStage, which can help bring things in order, but in face of errors none garantee that a later event is not processed before the error is dealt with.

3 Likes

Thanks.

The only clear reference to “order” I could find was:

Having multiple consumers is often the easiest and simplest way to leverage concurrency in a GenStage pipeline, especially if events can be processed out of order.

Is there something I’m missing in the documentation?

Would you recommend something other than Genstage then?

PartionDispatcher seems like a good place to start. Thanks for the tip.

No. The partition stuff implicitly does sequential processing, as afaik it’ll force events to go to the same consumer per key, but that can still process things out of order if there’s errors while processing.

I don’t think that’s a too useful question. GenStage is fine at what it does and you can build additional constraints around GenStage to prevent it from processing things out of order. You likely want to decouple maintaining ordering from the backpressure question (what GenStage does well) anyways. Especially the question of what should happen if a given input fails is something you’d need to answer before looking for solutions. Should it block further processing? Should it retry? Should it drop the bad input and continue with the next?

1 Like