Best way to schedule events in Elixir?

Hi,
we are trying to build a system, where a user can add events to a calendar. When the event is due, some specific business logic should be executed.
Do you think Oban would be a good fit? How would you handle the case, when someone changes the schedule. Would you rebuild the whole Oban queue?
Or GenServers? Each event could be a process which is responsible for its own scheduling. Would that make sense?
Or completely different?

I would be very happy to get some feedback about designing such a system in Elixir and leverage the power and architecture of the BEAM

7 Likes

I would use Oban. This topic can help you.

7 Likes

I have written such a system. The premise is as such:

  1. All Deadlines are wrappers of potential notifications computed based on an Event Date & Time, and entail the following:

    • Provider ID
    • Event Path
    • Event Date & Time (Date & Time with Time Zone)
    • Trigger Date & Time (UTC)
  2. The Event Date & Time is with an actual time zone, while the Trigger Date & Time is in UTC; the Trigger Date & Time is when the 1st notification is sent.

  3. The Deadline Provider (delegate) in each client application is responsible, on notification by the Deadlines system, to reply exactly one of the following:

    • Retire the Deadline with no further notifications
    • Increment the retry counter of the Deadline and re-notify at X date & time (in UTC) — updates the Trigger Date & Time — based on number of notifications already sent and the existing Event Date & Time

Note that on every invocation the Deadline Provider is provided with the original Event Date & Time and the number of notifications, as this allows us to insert code which determines whether to postpone deadlines, and when to postpone them to, based on business logic.

We would create/update deadlines alongside normal operations, such as:

  1. When changing a form from Pending to Active, create a deadline for Submission expiring 14 days from now

  2. When changing a form from Active to Submitted, remove the same deadline only if it exists and is still active

  3. If the user does nothing, then 14 days later, the deadline will hit, and we can notify the user via email (via the registered Deadline Provider)

This allows us to very easily manage thousands of outgoing notifications a day in one of the systems.


At runtime, we split out access with a GenStateMachine per Provider, which is responsible for sending all messages regarding its deadlines. The machine has the following states:

  1. :waiting — the initial state, implying that the server is in a quiescent state with no further immediate action. Will transition to :load to load the next batch of deadlines. When the Server is newly started, it will remain in this state momentarily, then transition to :loading on timeout.

  2. :loading — Server is loading the next batch of notifications. In this state we wait for the internal load event to trigger. All calls to create/destroy deadlines will be postponed until the Server enters waiting status again.

  3. :sending — Server is notifying the provider of deadlines that have become due. All calls to create/destroy deadlines will be postponed until the Server enters waiting status again.

The use of GenStateMachine a wrapper around gen_statem essentially transforms the deadline server to an intelligent write-through cache and we have used this subsystem happily for multiple years.


In my opinion, Oban is a task execution framework, not a business logic layer, so you should dispatch tasks for immediate execution upon deadline instead of trying to use the task execution framework to manage business logic, the latter would give you much less control and much less support you could otherwise get, such as type checking from Dialyzer and explicit calendar operations, etc.


For the Postgres savvy — we have had to write a custom Ecto type to store a timestamp with time zone properly, since in Postgres…

For timestamp with time zone, the internally stored value is always in UTC (Universal Coordinated Time, traditionally known as Greenwich Mean Time, GMT). An input value that has an explicit time zone specified is converted to UTC using the appropriate offset for that time zone. If no time zone is stated in the input string, then it is assumed to be in the time zone indicated by the system’s TimeZone parameter, and is converted to UTC using the offset for the timezone zone.

So, our solution is to create a Composite Type:

CREATE TYPE user_datetime AS (
  local_timestamp timestamp,
  local_timezone text
);

This then allows to capture 100% of relevant information pertaining to the original Deadline such as when & where exactly did/will the original event arise.

11 Likes

No rebuilding needed; scheduled Oban jobs are represented as rows in the database. The “event” could store its specific job ID directly and cancel / update as needed.

2 Likes

For scheduled jobs, we just use Faktory which has that feature built into it.

For reoccuring jobs, we use quantum | Hex and Faktory.

No offense to Oban, but I can’t imagine using a relational database table as a queue. We have something like 5000 concurrent works all trying to pop from the same queue. I think it’s called the “high contention consumer” problem.

In other news, I’m trying to develop a “serverless” async/background job system, heavily inspired by Resque, Sidekiq, and Faktory… :sweat_smile: Basic idea is that you signup and get a URL that you point your client to, you pay for concurrent workers, and it scales “indefinitely”… :wink: Also has management UI, monitoring, etc all built-in for free.

$1 per concurrent worker per month. So you want 5 concurrent workers? $5/month.

1 Like

Oban has scheduling and recurring jobs built in.

Regarding Postgres as a queue, it fairs much better then you may think, and a lot of people seem entirely comfortable with it :grin:

3 Likes

If Oban is your application’s performance bottleneck, it should either be because your business is booming (congratulations :tada:), or …

Very very true. I get a lot of criticism for trying to design a serverless background job system that can account for my day job team’s use case (which is extreme):

Those numbers aren’t even accurate due to Faktory server restarts and lulls in our “busy season”… :flushed:

Anyway, your numbers are really impressive out of Postgres. I couldn’t get near that much using Cassandra and lightweight transactions… but then I was trying to optimize for throughput and “infinite” scalability… hence the distributed datastore.

Gah, forums messed up reply… :point_up_2:

How big will it be? Probably oban is good enough for your case. It removes a lot of headaches of introducing new components until that’s really needed :slight_smile:

It’s better to keep those info in the persistent storage somewhere instead storing them in GenServer etc. anyway - otherwise you have to dump and restore on deployment or use hot code reload… which is not a small work. It can work and some do this - but it may not worth the hard work unless your case really needs it.

3 Likes

Oban - by the time Oban and Postgres do not work for your needs you will have a solid product used by many customers and will be able to afford other more sophisticated systems.

I recommend not wasting your time building an infinitely scalable solution today. Oban will get you far and quickly!

4 Likes

Thank you all, for your valuable feedback! I think we will start with Oban and see how it goes. Scaling / performance shouldn’t be a problem, because the application will be running on a local network with just a few users and roundabout 30 calendars.

Do you guys know if there is a possibility to avoid or check for overlapping jobs in Oban? Ideally, the events should be one after each other, but never parallel. We could catch that in the frontend and notify the users, but can I configure Oban to make sure, there’s always a kind of serial processing?

“ALWAYS serial” requirement sounds like an ideal case for a GenServer.

No, sorry, that isn’t what I meant. I’ll try it with an example:
A user wants to schedule an event from 9 AM to 10 AM and then another one from 10 AM to 11 AM. Totally fine. But I don’t want to create an overlapping event from lets say 9:30 AM to 10:30 AM.
But the more I think about it, I am convinced, that this isn’t the responsibility of Oban.

I haven’t implemented this, but I ran across this blog recently that demonstrates postgres functions to find overlapping meeting situations. Might be useful to you.

3 Likes

You could enforce it in Postgres using range type and constraint.

In that case you would model your event with an events table that would have a duration column of tsrange or timerange type - the values would be '[09:30, 10:30]'::timerange or '[2022-06-20 09:30, 2022-06-20 10:30]'::tsrange depending on your needs.

And then you could create a non-overlapping constraint on that column, like in the example:

1 Like

You can probably use the :unique option, but it could wind up being kludgy. I found :unique very handy for emails - to avoid duplicates. I think I would use ecto to model it and create a function that checks for overlapping events and schedules the oban job only if you get a valid changeset. Then you can encapsulate all that validation into the model. Probably want the check and update in a multi.

Well, the whole idea spawned from making a serverless background job system that could work for my day job. The notion being that if it works for our use case, it could work for anyone.

I’m a huge fan of Faktory. It’s a polyglot background job system that is totally free (including a nice UI). And it’s based on Redis, which has an extremely high performance ceiling.

The downside to Faktory (or Resque or Sidekiq) is hosting your own instance, monitoring that instance, setting up alerting, etc etc.

I’ve personally longed for a serverless version of these products by the 5th time our Redis instance blew out and our product came to a screeching halt. I think it’s possible, depending on how motivated someone is to make it.

PostgreSQL can only do this effectively due to a special type of locking system (Advisory Locks) that is dramatically more efficient for this particular use case.

The first place I ever remember seeing the technique was a Ruby Queue called Que. GitHub - que-rb/que: A Ruby job queue that uses PostgreSQL's advisory locks for speed and reliability.

I’m a huge fan of advisory locks! We use them frequently with Sidekiq/Resque/Faktory, as well as in web requests… :slight_smile:

We also make use of other awesome Postgres features like jsonb columns and logical decoding of replication for CDC purposes. Postgres is an amazing piece of tech. Our product has 19 Postgres instances and not wimpy ones either; we love Postgres!

I’ve never tested Postgres’ advisory locks for the “high contention consumer” problem; I just assumed it would break down because it’s a very difficult problem… and Postgres has broken down for us in other ways.

We tried both Etcd and Redis for distributed locks and even Etcd went into completely unrecoverable states. Hell, even Redis Cluster has issues, hence why we just went with propriety AWS MemoryDB.

Distributed computing is hard.

1 Like

I always assume that if I have system that manages to outgrow Oban, the next move to optimize would be Broadway.