Help with tuning data flow in Broadway: c:prepare_messages/2 question

cigrainger · January 17, 2024, 1:24pm

Hey all! Hoping I can get some help/advice on tuning Broadway pipelines. Right now something I’m struggling with is accumulating sufficient messages in prepare_messages/2 . I’m using a rabbitmq producer. It seems like no matter what I do, even with min_demand == 5 and max_demand == 10, I can’t get prepare_messages to see more than 1 message at a time. Which is causing way too many database queries, when I’d like to be grabbing 10 rows at a time.

I guess what it comes down to is: I’m not sure what levers I can pull or even where to begin tuning this. I mistakenly thought that prepare_messages would respect min_demand and max_demand, but I think I’m misreading the docs here:

The length of the list of messages received by this callback is based on the min_demand /max_demand configuration in the processor.

LostKobrakai · January 17, 2024, 1:45pm

I don’t think the docs are particularly helpful here. Yes, min_demand/max_demand configure how many things processors request from producers. That however doesn’t also mean that the producer will emit produced events/messages in batches related to those settings. By my understanding the lists prepare_messages receives map to the lists emitted by individual GenStage callbacks being called. If those callbacks only emit list of individual events then prepare_changes will receive only such.

If you want to batch those up you’d likely need a batcher (e.g. genstage consumer_producer) between your current producer and broadway, which aggregates individual events into batches. Or see if your producer could be configured to emit events in batches.

Edit:

There it is for rabbitmq:

github.com

dashbitco/broadway_rabbitmq/blob/f4682ebcdc8c3bbb241004f277f5998a95fddf77/lib/broadway_rabbitmq/producer.ex#L533


      
              meta
              |> Map.take(config[:metadata])
              |> Map.put(:amqp_channel, channel)
          
            message = %Message{
              data: payload,
              metadata: metadata,
              acknowledger: acknowledger
            }
          
            {:noreply, [message], state}
          end
          
          def handle_info({:EXIT, conn_pid, reason}, %{channel: %{conn: %{pid: conn_pid}}} = state) do
            log_warn("AMQP connection went down with reason: #{inspect(reason)}")
            state = %{state | channel: nil, consumer_tag: nil}
            {:noreply, [], connect(state, :init_client)}
          end
          
          def handle_info({:DOWN, ref, :process, _pid, reason}, %{channel_ref: ref} = state) do
            log_warn("AMQP channel went down with reason: #{inspect(reason)}")

cigrainger · January 17, 2024, 2:27pm

Ahhhh thank you! I don’t feel so much like I’m going crazy now. For my mental model it’s surprising that prepare_messages/2 isn’t aggregating, given that it’s receiving a list. But hey, there it is.

LostKobrakai · January 17, 2024, 2:31pm

It makes sense due to how GenStage works, but it is indeed strange given how this is marketed to Broadway users.