Broadway Pipeline Design

smanza · December 1, 2021, 1:14pm

Hello.

I’m designing an application where multiple node have to communicate but I’m not using the Erlang distribution only raw tcp messaging. And I want to set backpressure for the incoming messages to avoid overflow of the system and be able to scale with multiple processors.

I ended up to define something like this using Broadway:

Listener which spawns a process for each connection
Each connection spawns a Broadway pipeline with a producer which queue the tcp messages and a given number of processors to deal with the request and send back data

This is working fine, but I’m wondering if it’s a good design to have n Broadway Pipeline or how will it be better to have a single pipeline for the entire app (listener) and provide more processors to scale.

Thanks you

cmo · December 1, 2021, 9:55pm

Depends on the scale. If you’re expecting a million connections then you’re going to have a million * processor concurrency processors running. As far as I understand it, you define the level of concurrency in your pipeline that is ideal for your system. If you have a changing number of pipelines, then you’re providing backpressure to each connection individually but not the system as a whole. Can you pass an id or the pid of the process as part of the message and respond in ack or something?

smanza · December 2, 2021, 7:21am

I think you are right about the global concurrency limit and control for the upstream messages. For now I have 5 processors by requests but if the system will get 1_000 connections, this will end up With 5_000 processors instead of maybe 100 global processors .

In fact I think I did like this also so each pipeline would have a given transport and socket in its context but I guess I can pass this a message from the producer (which is a forwarder a incoming upstream messages)