I am looking for a feedback on how we solve situations we experience with exq:
On start-up of the node, exq will move any entries in the “backup” queue to the main queue. The issue is when we deploy new version of the app, we have a new instance of the app running, but with the same node id. So exq removes job from the “backup” queue and it’s being performed by new instance, while the older node is still running the job which completes successfully. So we have job executed twice.
As suggested in the docs, we can implement a unique node identifier, so then on each deployment we will not touch jobs for previous deployment node.
That leads us to situation 2.
Node 1 (older node) got terminated without finishing all it’s job
Implement backup queue cleaner for previous node. Backup cleaner is a GenServer process which after configurable period of time wakes up, find previous node_id and moves all job belonging to that node_id into a main queue.
Question is, are there a better way to make sure that we don’t re-queuing in-progress jobs during deployment?