Help with tuning data flow in Broadway: c:prepare_messages/2 question

Hey all! Hoping I can get some help/advice on tuning Broadway pipelines. Right now something I’m struggling with is accumulating sufficient messages in prepare_messages/2 . I’m using a rabbitmq producer. It seems like no matter what I do, even with min_demand == 5 and max_demand == 10, I can’t get prepare_messages to see more than 1 message at a time. Which is causing way too many database queries, when I’d like to be grabbing 10 rows at a time.

I guess what it comes down to is: I’m not sure what levers I can pull or even where to begin tuning this. I mistakenly thought that prepare_messages would respect min_demand and max_demand, but I think I’m misreading the docs here:

The length of the list of messages received by this callback is based on the min_demand /max_demand configuration in the processor.

I don’t think the docs are particularly helpful here. Yes, min_demand/max_demand configure how many things processors request from producers. That however doesn’t also mean that the producer will emit produced events/messages in batches related to those settings. By my understanding the lists prepare_messages receives map to the lists emitted by individual GenStage callbacks being called. If those callbacks only emit list of individual events then prepare_changes will receive only such.

If you want to batch those up you’d likely need a batcher (e.g. genstage consumer_producer) between your current producer and broadway, which aggregates individual events into batches. Or see if your producer could be configured to emit events in batches.

Edit:

There it is for rabbitmq:

1 Like

Ahhhh thank you! I don’t feel so much like I’m going crazy now. For my mental model it’s surprising that prepare_messages/2 isn’t aggregating, given that it’s receiving a list. But hey, there it is.

It makes sense due to how GenStage works, but it is indeed strange given how this is marketed to Broadway users.