GenStage - producer discarding events

I have a GenStage based application where the producer queries Solr and passes documents on to the consumer stages. I have just started having an issue whereby GenStage is producing messages such as warn GenStage producer #PID<0.433.0> has discarded 50000 events from buffer

What does this mean exactly? Does it mean that the producer cannot keep up with demand so is choosing to ignore some requests? Or could it actually be dropping data that should be processed by the consumer stages?

The first case is not really a problem since the same data will be passed along the line anyway, but if there is a chance of data loss it would be quite worrying.

3 Likes

This is documented in the section on the buffer: https://hexdocs.pm/gen_stage/Experimental.GenStage.html#module-buffering

Either GenStage buffers everything in which case you get unbound memory growth, or it buffers up to a preconfigured amount and then starts load shedding.

GenStage is built around demand driven data processing. In your case it sounds like you want some kind of dedicated queue. Then you could have GenStage producer(s0 that pulls from the queue when it receives demand from workers.

2 Likes

To add to Ben’s response, it seems your producer is directly querying Solr without actually caring if there is a consumer ready to process your events. You need to make sure to only emit events after you receive demand in handle_demand. If by any chance you can only receive events from Solr in large batches, then you can increase the buffer size, but be careful to not have an unbounded memory growth by queueing Solr messages forever.

2 Likes

Hi thanks for the help and the documentation reference. It turns out my error was in hard-coding the rows parameter rather than letting it be set by the consumer demand. I guess this meant that the producer was returning more than the consumer requested, leading to it buffering before eventually dropping it.

2 Likes