Odd problem with Elixir worker, RabbitMQ, Ecto and Postgresql inserts with triggers

vishal-h · June 20, 2020, 8:11am

Here is an odd behavior we’re seeing. We send thousands of SMS/Text messages through our app. Once the deliveries are done, the gateway starts calling on a hook and provides the delivery status for each message. The hook after receiving the message enques the status to be updated in DB. RabbitMQ is the broker.

Worker picks it from the queue, and upserts into 2 tables (group and detail). There is a trigger defined in Postgres that updates couple of views upon upsert. So basically there are 6 db ops for every message on hook. query for existing / insert_or_update on 2 tables + 2 view updates.

The funny thing is everything works smoothly till the number of jobs in RabbitMq drops below 10. Doesn’t matter how many messages are there (tested, > 100, > 1000, > 2000), always the glitch occurs when the jobs to be consumed are less than 10,

This is the error we see

(DBConnection.ConnectionError) tcp recv: closed (the connection was closed by the pool, possibly due to a timeout or because the pool has been terminated)

I am pretty sure, the pool is getting terminated for some reason.
If the worker is stopped and restarted in daemon mode, couple of jobs are consumed and the above error occurs again!
But if the worker is started in interactive mode (iex -S mix), the jobs are consumed without a glitch.

Not sure what to make of this issue. Any pointers would be appreciated!

Thanks…

al2o3cr · June 20, 2020, 5:52pm

Hard to say without code, but given the “only happens at almost-empty queue” behavior I recommend you check to see if your worker process is doing something that blocks (for instance, fetching a batch of messages from RabbitMQ) while holding open a transaction.

vishal-h · June 21, 2020, 2:18am

But the code works in interactive mode!

Let me see if I can share the code.