Hi guys,
We want to set up a pipeline like like
S3/SQS -> [Producer] -> [Processor] -> [Consumer] -> Database
We want to setup in a way that when some one place a file in S3, it will notify SQS.
[Producer] will read from SQS, get object_key then read the actual content in S3.
Our problem is that, when we first start the stack, everything runs correctly.
[Consumer] demands [Processor], [Processor] demands [Producer], [Producer] check SQS then read from S3, for example 10k lines. Because the max_demand is 1k, it takes a while to actually exhaust the whole content from S3.
Our problem is that after first run, [Consumer] doesn’t seem to demand any more. We placed a new file into S3, we got notification into SQS. But since [Consumer] doesn’t demand anymore, [Producer] doesn’t go and take a look into SQS so nothing enters the system.
Do we implement and understand GenStage incorrectly?
We read about buffering events and buffering demand but cannot figure out how it fits into our use case.
Is there a way to keep asking for more data safely?
We don’t want to go over producer’s buffer_size, i.e. if we have 10 files in the SQS, we don’t want to read them all in but read 1, wait until it is safe to read the next one, i.e. in case database is down we don’t want to keep content of 10 files in memory.
Thanks,
Son.