I have a dilema that I would like to hear your input on:
We have a server that serves a decently high rate of api calls and needs to stay as fast as possible.
In our business logic there are a bunch of background jobs that need to run and that are not time critical, there is no big rush in finishing them, like orders export, notifications, scheduled tasks like generating invoices, etc. They can take significant CPU though.
Currently we run them via Oban inside the same servers that serve the API and the rest of our business logic. My concern is that they can impact performance and starve the CPU which will affect the performance of the API’s and websockets.
Would be more appropriate to create another umbrella app or something that I can pack differently via mix releases and deploy to a separate server that has no rush in finishing the tasks and it’s not a problem if saturates the CPU?
How do you guys do it??
There are a few questions that need to be answered. What does “a decently high rate of api calls” mean? Do the API request handling put high load on the CPU relative to RAM? It usually doesn’t but history has seen many different cases. Maybe a load/stress test can show how much requests and background tasks your service can take?
There are 2 ways to go about this:
- Rate limit the processing - use the Oban rate limiter (not sure how it works and it seems you need Oban pro) or Broadway or Genstage, this will save a lot of complexity and time in general.
- Use RabbitMQ or built-in erlang message passing from node to node to pass events to another server that will do the processing. In this case you need to decide whether losing events is acceptable and a strategy of what happens when something goes down, much more complex compared to the first solution.
I get where you are coming from and I am a paranoid prepper myself but I’d advise you to measure first before going into a potential rabbit hole.
I agree that I may be getting to a potential Rabbit hole. I will leave things as is for now and try instead to add more metrics and limit background workers concurrency to a certain level.
Something to remember when you are developing in the BEAM: The scheduler is preemptive, operating on a time-sharing principle. This means that the VM will actually pause long-running processes (such as your background jobs) from time to time in order to let other processes (such as your web requests, which should be very short-lived) have a turn. This is done by allocating a specific number of reductions (function calls) to each process, after which the scheduler will switch to the next process in the queue.
The primary advantage of this approach is that it ensures every process gets some CPU time even under very heavy load, promoting fairness and responsiveness. It is not to say that your background jobs will have no negative impact on the time it takes to finish processing a request, but it’s not nearly as detrimental as a cooperative scheduler, which would not pause processes and just lets them run until they finish what they are doing. This can lead to issues like process monopolization, where a single process takes over the CPU and doesn’t allow other processes to run.
In Erlang’s system, the preemptive scheduling helps in distributing resources evenly, maintaining system responsiveness, and avoiding the risk of any single process overwhelming the system. This design reflects Erlang’s focus on concurrency, fault tolerance, and distributed computing, making it particularly suitable for scalable and highly available systems.
We run our Oban workers on separate VMs (Heroku dynos) that don’t serve web traffic. While the BEAM scheduler is great, it’s useful having a separate CPU/memory environment for each that can be scaled independently.
We do this by disabling the queues on the web workers with an environment variable which is read by the
Interesting approach. Do you have any kind of autoscale in place?
For example how do you scale your heroku dynos based on jobs in the queue.
I have not hear about it before. I will definitely try it.
What kind of metrics do you track?
No, our main bottleneck is our database and auto scaling up more dynos would just increase the load on this to the point where it had an effect on web users. Auto scaling down during quiet periods might save us some money, but it’s not a cost that’s meaningful right now.
Not the person you are replying to, but we also use the same approach.
There are a series of web servers that just serve HTTP traffic, and then there are a series of background servers that just run Oban jobs. The server kind is identified by an environment variable that we check to conditionally start Oban queues or not.
We didn’t need any autoscaling yet so we don’t have any solution for working with that. Anyway, Oban Pro has the DynamicScaler plugin that can do autoscaling for you.
I did not know about that plugin. May be useful sometime where there are jobs that are tied to main db like importing external data to S3 or media processing.
This approach is very valid. From financial companies I know that they usually separate their core systems with high availability requirements from other tasks such as background batch jobs, analytics, reporting, and additional services.
To do this you basically have to copy your database and use this copy for everything you don‘t want to run in the core system. Copying the database proves to be a challenging task in itself which needs a dedicated team that does the loading on a daily basis and fixes any load issues that regularly arise. Additionally, the database copy will never be in complete sync with the core database.
@JeyHey you brought up an interesting subject about duplicating the main database or restoring the daily backup to another server and using that for reporting or data extraction tasks. It ca not be used by workers that ingest data into the main database, but for data extraction that does not need to be realtime can be a good solution.
As @dimitarvp said though we need to be careful to not go into all these kind of rabbit holes just for premature optimizations. Only if the metrics tell us that there is something we need to do about it.
For the moment only the duration of a span. Just the fact that you’re starting and then ending a span already automatically gives you a duration. And they can be nested (meaning you can measure the duration of a web response and the DB time that’s part of it, for example). You can also attach all sorts of data to the traces / spans. Pretty awesome stuff.
Dunno if you checked the GitHub repo or their website. Please do and check out the screenshots (they recently added dark mode btw).