How does Oban compare to SideKiq in Rails in terms of architecture?

OrontaMedu · May 19, 2023, 12:59pm

In terms of architecture and the way Oban and Sidekiq work on the high level:

why does Oban use a DB – Postresql or Sqlite – whereas Sidekiq does Redis only? Sidekiq doesn’t use any relational Db at all.

It’s aware that an application in Elixir, thanks to OTP, may not need a message broker. But Oban still uses a proper relational DB nonetheless.

Does Oban employ a different approach in regards to background jobs compared to Sidekiq? Is a Db essential for Oban?

stevensonmt · May 19, 2023, 3:21pm

I cannot really answer your question directly, but I would point out that the Engine behaviour is public, meaning you should in theory have the option of implementing the behaviour for any backend data store you like. So if you want to use Redis, you could theoretically do so. Oban is not just an open source library, though. The developers host a commercial business based on Oban, so it makes sense for them to focus their energies on providing complete support for a limited number of backends rather than trying to support all of them out of the box.

al2o3cr · May 19, 2023, 3:32pm

Some of it is just a side-effect of history: when Sidekiq was created, MySQL was a common database choice. Until relatively recently, Oban relied on the Postgres NOTIFY mechanism which isn’t available in MySQL at all.

OrontaMedu · May 19, 2023, 4:43pm

I’m not asking “Why Postgresql in particular?”. I’m asking “Why does one use a DB, whereas the other one not?”

dimitarvp · May 19, 2023, 4:46pm

Redis is a database, just not a persistent one.

(Oh, in fact you can enable persistence there as well.)

OrontaMedu · May 19, 2023, 4:47pm

Yeah, but I still wonder: do they, or not, work similarly in terms of algortihm of creating, scheduling and processing jobs? Architecture.

Or is the fact the one uses a relational Db and other one does Redis has to do with the way they work? Could it be fundamentally different?

sorentwo · May 19, 2023, 4:59pm

There is some context in the original announcement here on the forum: Oban — Reliable and Observable Job Processing

I also presented an Architecture of Oban talk a few years ago. Most of the architecture is the same as when I gave the talk.

sodapopcan · May 19, 2023, 6:14pm

I’m having trouble finding the article/FAQ/whathaveyou but I remember reading an article about 10 years ago by Mike Perham where he said a good part of not supporting anything other than Redis, aside from Redis being very well-suited for the problem, was about avoiding maintaining a costly adapter layer. There’s a tiny bit of context here. Sorry I can’t find the exact article I’m talking about. If I manage to later I’ll update my answer with it but right now I’m hardcore Friday procrastinating and I gotta stop that

linusdm · May 19, 2023, 6:42pm

That is a magnificent presentation! Both the contents and how you visualized and lay out all the components. Very insightful, thanks for sharing!

OrontaMedu · May 20, 2023, 1:34pm

Most of us don’t want to rely on Redis for production data, and Sidekiq is a largely proprietary legacy system. Not the best base for a reliable job processing system.

Will your elaborate? What precisely are the downsides of Redis for this kind of project? The fact that it’s “legasy and proprietary”? Is this it?

And what are those in regards to Kiq in particular?

SideKiq despite of, or because of Redis, is a successful project nonethess.

OrontaMedu · May 20, 2023, 1:41pm

Ok. The role of Postgresql in Oban is the same as the one of Redis in Sidekiq then.

I initially thought that the SideKiq and Oban could have completely different artchitectures.

stevensonmt · May 20, 2023, 4:18pm

There are lots of interviews, videos, and podcasts about Oban out there. This interview in particular provides a quote that I think answers your question directly:

The requirements all emerged from pain points running Redis-backed queues—a lack of introspection, zero observable history, and no transactional guarantees. More importantly, we were locked into working with Sidekiq Enterprise for some features that the license precluded me from porting to Elixir.

This interview/podcast also addresses why PG is a better tool for the job:

PARKER SELBERT And the idea was that you just had a stream of jobs, or a stream of events, and then you could handle those events at any particular place, adn then augment them and kind of push them back through. And that’s directly what led to Oban. So it was kind of the combination of Sidekiq style queues and jobs and workers, but then having them stick around and actually be persistent.
So a lot of the stuff that makes Oban as powerful as it is for doing uniqueness and workflows, and the things that people really want to use it for, is because it’s in Postgres, and it keeps the jobs around after they ran… Which means that for a cron job I can say “Did I run this job an hour ago?”

JEROD SANTO [00:12:05.17] Right.

PARKER SELBERT I don’t rely on some side effect.

JEROD SANTO Like, built-in observability.

PARKER SELBERT Yes, totally built-in observability.
…

JEROD SANTO … And I think that’s the advantage of having the architecture that Oban provides, is you have everything in Postgres, so you don’t have to worry about duplication and those kinds of problems across your nodes.

One note on that Changelog podcast is he mentions SQLite as being theoretically possible as an Oban engine but I believe that Oban now has a SQLite engine option out of the box.

katafrakt · May 20, 2023, 4:30pm

In principle, yes, they have the same architecture. There are some differences coming from choosing one data store or the other.

In case of choosing a relational database, and if it’s the same one as your main production database, you get an outbox pattern basically for free. So you can schedule a number of jobs inside a database transaction, but they won’t really be scheduled until the transaction is committed. It’s a common problem in Sidekiq that you schedule a job inside a transaction and either transaction later fails or the worker picks up a job before the transaction is committed - in either case the job fails, because it cannot fetch relevant records or run against stale data.

On the other hand, if you are doing a lot of background processing, having it backed by Redis reduces some load on your database. In some cases this might be very important trade-off.

Finally, remember that Sidekiq and Oban are just parts of wider picture. On Ruby side your have Sidekiq and Resque that are backed by Redis, but also you have DelayedJob, Que and GoodJob that are backed by relational database. Also in Elixir, you have Exq and Verk that use Redis instead of a relational DB, if you prefer this flavour.

cjbottaro · May 21, 2023, 3:07am

My understanding is that Sidekiq uses Redis because it has a very big focus on low latency and high throughput. Redis’ BLPOP makes that very easy, especially with priority queues.

Fun fact, the author of Sidekiq went on to make Faktory, which is a language agnostic version of Sidekiq where all the logic is on the server and not the client library. It originally used RocksDB as the datastore, but switch back to Redis because BLPOP is so good.

It’s been on my reading list forever to learn how Oban does efficient priority queues without something like BLPOP.

vassilevsky · May 25, 2023, 8:30am

Sidekiq was built as a “better Resque” — an older background job executor. Resque used Redis, so Sidekiq also used Redis, and the protocol was the same, so you could just swap Resque with Sidekiq and your jobs would still complete. So we can say it was historical, too.

cjbottaro · May 27, 2023, 8:07pm

“Better” is very subjective, I think. We still use Resque today because it runs every job in a forked Unix process. It was designed this way specifically to combat Ruby’s memory bloat issue, which Sidekiq is extremely susceptible to. I would say their use cases are pretty different because of the memory thing.

kanishka · May 27, 2023, 8:40pm

As more of database tables are held in memory, I wonder how much difference there is between postgres inserts/deletes and redis actions, in 2023. I haven’t searched for benchmarks.

I am starting to enjoy being able to look around at oban’s internal state using sql instead of a job system specific client or redis client.

ananthakumaran · May 28, 2023, 4:27am

Redis’ BLPOP makes that very easy, especially with priority queues.

Sidekiq doesn’t support priority queues. What they have is called weighted queue, which has different semantics compared to the priority queue. If my memory serves right, only the open-source version of Sidekiq uses BRPOP, the pro version uses RPOPLPUSH which is safe and won’t lose the job if your worker crash. There is an open redis issue to add support for MBRPOPLPUSH which would make it straightforward to implement priority queues via Redis.

cjbottaro · May 30, 2023, 3:40pm

What are the semantics of proper “priority queues”? As far as I know, both Sidekiq and Faktory have two modes, the weighted mode, and then a mode where all of queueA will be processed before moving on to queueB, i.e. it inherits the semantics of BRPOP.

I didn’t know that Pro uses MBRPOPLPUSH. We use a our own version of Faktory that uses Redis scripts to give us better guarantees.

sorentwo · May 30, 2023, 4:12pm

Data structures and operators are related in Redis, but they aren’t the same. You can pop from a list without blocking or you can block and wait, but it’s the same double ended list underneath.

Sidekiq makes use of two data types for job storage—lists for queues and sorted sets for retries or scheduled jobs. The key difference for Pro/Ent is that they use a script to pop a job off the queue and atomically stash it in another structure (I don’t recall which because I switched to a hash for my implementation in Kiq).

The “weighted” part comes from how frequently Sidekiq polls each queue for jobs. By default, it round robins between them, but you can change that ratio. That’s a poor guarantee for concurrency though—you can end up with slow jobs from one queue blocking job processing in all the others. Hence the strict concurrency guarantees in Oban, each queue operates autonomously and respects strict concurrency limits.