Background job queues: When to use? When not to use? Which one to use?

@koudelka, honeydew looks amazing. I really like the way it just “merge into another table”. That makes async reaction to events a breeze. But I think that other job queues have some other advantages, such as a built in web interface (which is normally underrated)

We’ve recently started needing durable background jobs and we seem to be taking an approach I haven’t really seen discussed. Rather than relying on a job queuing library we use straight OTP and rely on application state to handle durability.

What I mean by “rely on application state to handle durability” is rather than having some sort of queuing system like a “jobs” table or a separate Redis server we just query our data and depending on the state of the data we can determine if we need to start a job.

An example of this is we need to register a webhook with a 3rd party API for our application to work. When we register the webhook we receive a webhook id back from the 3rd party API and store that in our application. Suffice it to say it’s really important that these webhooks get registered. We need to be able to handle failure states (retries) and recover from system restarts (durability).

When we run the process to create an account we start a “worker” GenServer that is responsible for registering the webhook. If all goes well the webhook is registered, we store the id and the GenServer stops. If things don’t go to plan we try again after a period of time using Process.send_after/4.

When the system restarts we have a “booter” GenServer that is responsible for finding all accounts that don’t have a webhook id and starting worker processes to get those accounts taken care of.

Using this methodology we stay completely within the ecosystem of OTP and our application. We don’t have to introduce any new dependencies or worry about our job queue and application state getting out of whack. OTP has all the tools necessary to make scheduling, retries, etc. a breeze.

Curious to hear people’s thoughts on this approach. I’m sure it doesn’t take care of all use cases but it’s worked well for us thus far.

2 Likes

If I understand correctly this approach sounds a lot like what Honeydew ecto_poll queues do. They store both the fact that job is needed, and its state in a couple of columns added to your existing application tables. This way jobs only run if the transaction they are created in completes successfully, and should be just as durable in the sense of restart resiliency, failure retries, and a guarantee that only one worker is trying to run a job at a given time.

1 Like

I don’t believe so. Our workers are started explicitly rather than through a polling mechanism. The only time the database is polled is by the “booter” process when the application starts. We also don’t keep any job specific data in our database.

Just wanted to add that I recently released rabbit - a library for building applications using RabbitMQ. Which of course can be used for background job processing. Permits a lot of flexibility in how you want to setup your producers and consumers. Basically allows you to “build your own job framework”.

So far - its managed to reach the “typical” max performance that a single queue in no ack mode with a single consumer will achieve - about 50k/second. Meaning the bottleneck ends up being RabbitMQ itself.

Always open to contributions.

2 Likes

I just want to add to this thread that @sorentwo recently released Oban — Reliable and Observable Job Processing which looks to be a great addition and I’m hoping to try it out in depth soon.

3 Likes