Question:
Multiple producers with a single consumer, is there a way to ensure that the demand is spreaded across all producers ?
to be more detailed, we have the following stages pipeline:
[A] - [B] - [C]
Stage A … a variable number of instances, processing incoming data
Stage B … a single instance, think of a collector, shall distribute demand across producers, caching demands
Stage C … a variable number of instances which do the actual processing
I would like to implement stage [A] as a producer, but didn’t find any hint the GenStage source where the consumer may spread the demand across multiple producers.
or
Another way would be to implement stage [A] simply as GenServer instances and manage to spread the demand across them in stage [B], which will then be only a “producer”.
You can control the demand from the producer but you can control how often you ask from consumers (B and C) by implementing the handle_subcribe/handle_cancel callbacks and setting the demand mode to :manual.
@josevalim thx for the fast (very fast ;-)) reply, it wasn’t very clear to me, but now it is, I’ve overseen that the handle_subscribe callback is called on both sides, consumer and producer.
But in the case stage B is set to :manual, I see no way for stage B to determine if there exists demands from stage C. In the case stage C is busy, no demand, stage B shall stop asking stage A (there could be more then 1, which shall be asked round-robin alike) for events to prevent buffer overflow. In the case there is demand from stage C, stage B shall ask for events …
I’m wondering if it would be better to let the pipeline start at stage B (B would be a :producer) and the original stage A would be implemented as simple GenServer(s), registering to stage B (the new start of the pipeline) and stage B requests events from them in the handle_demand method.
I have a project where a producer connects to an external websocket API and that API streams more data than a single producer can handle.
I’m manually creating separate producers for different endpoints of the API, but it’s a bit cumbersome and I’m looking for a cleaner way to distribute the incoming demand, ideally in such a way that I don’t need to manually measure how busy each endpoint is.
That one appears to be about ordering events produced by different producers.
My question is making sure the producer can handle the volume of data streaming in from the external API. (Ideally via running concurrent copies of the same producer, if that is feasible)