I’m using Broadway to consume Change Data Capture events from Kafka.
As such we need to process the events in the order they arrive. Each entity (let’s say “posts” and “comments”) has Change Data Capture events published to a different Kafka topic. The message key is set to the UUID of the entity.
Because broadway alway ack’s a Kafka message even if it is marked as failed or an exception occurs we have wrapped the handling function in a infinite retry, so it captures any error and recursively calls the handler again with the same message. If it’s an transient failure it will eventually be successful otherwise we need to deploy a fix.
We are batching based on the message topic, mapping the topic directly to a batch, approximately: Message.put_batcher(message, String.to_atom(message.metadata.topic))
My question is around concurrency, we need to set the concurrency for each batcher to 1
to ensure the messages are processed serially, in order.
Would we then be able to set the producer concurrency higher as the batchers can be processed in parallel (since each batcher is a different topic, i.e. relates to a different entity type).
Broadway.start_link(__MODULE__,
name: __MODULE__,
producer: [
module: {producer_module(), producer_opts()},
concurrency: 2 # number of batchers (aka topics)
],
processors: [
default: [
concurrency: 10
]
],
batchers: [
cdc_posts: [
batch_size: 100,
batch_timeout: 200,
concurrency: 1
],
cdc_comments: [
batch_size: 100,
batch_timeout: 200,
concurrency: 1
]
]
)
By having a concurrency of 1
per batcher, but 2
for the producer, it means that the events in each topic/batch will be processed sequentially, but the batchers will not block each other, for example if a message in one batcher can’t be processed it will retry and not block the other batchers?
I’m not sure how the processors concurrency relates…
Many thanks!