Guarantees provided by GenStage?

stavro · April 20, 2017, 9:21pm

In general, the docs for GenStage seem to have very minimal documentation around how to deal with errors, what to expect in the presence of errors, at most once vs at least once delivery of events, etc. I’m having a hard time understanding the guarantees that GenStage provides in the presence of errors during stages.

For example, a textbook implementation of GenStage.BroadcastDispatcher (seen here: https://gist.github.com/stavro/15308d54b41c6feab371ee09de198b55) seems to have pretty lackluster behavior in the face of errors during a consumer stage. In a simple test, if I enqueue 10 events, and a single consumer has an error on the first one, the next 9 appear to never even be attempted.

Since the consumer is supervised independently, and only received a single event to process (as seen in the debug logs), why are the other nine events not handled by the consumer after its supervisor restarts it?

josevalim · April 20, 2017, 9:57pm

That’s because it does not give any guarantees. If there is an error, you lose everything that is in the message queue and you didn’t persist elsewhere yourself, as in any other Elixir process or abstraction, such as a GenServer.

stavro · April 20, 2017, 10:29pm

Got it - I misunderstood and thought that the message queue was kept outside of the consumer process.

Hmm. Ok. Thanks!

mbuhot · April 21, 2017, 10:19am

Are there any recommendations for an at-least once delivery of messages?

Eg if you had a pipeline with:

RabbitMQ -> GenStage -> GenStage -> GenStage -> Database

And you want to ensure that every message from the RabbitMQ is processed by the GenStage pipeline and written to the database.

Seems like you would need to keep track of the original message ID through the pipeline and only send the ACK to RabbitMQ it after writing to the database.

pma · April 21, 2017, 10:39am

The solution I’m using is to have the delivery_tag as part of the event and acknowledging to RabbitMQ only after the last stage: https://github.com/pma/wabbit/blob/master/examples/direct_exchange.ex#L98

madshargreave · August 1, 2017, 8:24am

Did you come up with a solution for this? I am in the same situation, although with Kafka

mbuhot · August 1, 2017, 12:35pm

Haven’t had to connect a multi-step GenStage pipeline to Kafka or Rabbit yet. But I would approach it the same way, updating the consumer offsets in the last stage of the pipeline.
Which Kafka client are use using?

madshargreave · December 21, 2017, 12:44pm

@mbuhot Sorry for the late reply. I am using the Brod erlang library