Context
I’m currently working on a real-time flow monitoring project but I have several questions about my architecture and the use of GenStage
.
Architecture
Currently, I need 4 “processing steps”. My simplified workflow looks like this:
--> [CX] -+-> [DX]
[A] -> [B] -+-> [CY] -+-> [DY]
--> [CZ] -+-> [DZ]
Here, A
, B
, C
and D
represent a step in the processing of my data.
X
, Y
, and Z
represent a type of processing step.
In order to clarify what I call “type” for C
and D
, these two processing steps are divided into sub-steps that can be processed independently of each other.
In my current architecture A
, B
, CX..Y
and DX..Y
are GenStage
. B
and CX..Y
are each used in BroadcastDispatcher
in order to send the events to all their consumers.
To summarize: A
gets the data and sends it to B
, B
sends it to each type of C
and each type of C
sends the data to each type of D
.
Problem
I am currently asking myself several questions and I need an outside opinion because I am stuck on certain points.
How to replicate C and D?
Indeed, I’m starting to realize that the processing time of some types can sometimes reach several minutes which is blocking my flow of events during this time.
My main problem here is that B
and C
broadcast their events to ALL subscribers. Let’s imagine that I have 2 types of processing for C
that we’ll call X
, Y
.
So that gives us the following scheme:
[A] -> [B] -+-> [CX] -+-> [D]
--> [CY] -+
Now, let’s say I duplicate processing type X
because I’ve noticed that sometimes this type takes too long to perform. So I’m going to end up with this:
--> [CX1] -+
[A] -> [B] -+-> [CX2] -+-> [D]
--> [CY] -+
Here the problem is that my type X
will receive the data twice and I find myself with the same problem as at the beginning.
What I’m looking for is something where for each type, a consumer who is free will process the data (here, CX1
is busy):
[CX1] -+
[A] -> [B] -+-> [CX2] -+-> [D]
--> [CY] -+
I think I probably have the wrong way to use GenStage
so I’d like to get your opinion.
Thanks in advance.