I’d like to ask for some advice about how to organize my data pipeline with Broadway.
There is a stream of user-generated events (user actions in an application). I need to send notification email about those actions to users subscribed.
First, I need to group events into batches according to some event data: merge similar events, cancel out opposite events, etc. So this process should reduce the number of events and make them more meaningful. Then I need to group those new merged events again to send them in batches (one email may contain several notifications).
This brings to the idea of using Broadway as it seems a very good fit at first glance. But the thing is I would need to do batching twice: one is to merge similar events, another is to group notifications before sending. The batching criteria will be different (read “batch key”) for those.
What would you suggest? As far as I understand, one instance of Broadway can only batch once.
Is it feasible to set up the second Broadway pipeline to do the second batching? There will be an issue with “acking” though: the first pipeline will ack messages in the initial producer without waiting for the second pipeline.
I’d appreciate any thoughts on the subject.