How to deal with performance issues using background jobs?

Hello,

I’m having some performance issues working with background jobs, especially resizing images. I have an API, which is done with Phoenix, and background jobs using Exq. My problem is if suddenly the app has to deal with several images the API gets slower, much slower, and if there are a lot of images the API does not respond at all and AWS restarts my dockers. A complete mess, because the restarts don’t allow the Elixir app to finish with the image resizings, so it is down for a long time.

I remember when I worked with Ruby on Rails and Sidekiq that I had completely different machines for the web servers and workers for background jobs. This way I could scale them independently and in the case of a big bunch of background jobs, the web server could still reply correctly.

Now I understand that Elixir makes a cluster and all nodes communicate each other, and this is used for several things like websockets. However, I think in my case could be useful to have different machines for background jobs and web servers.

What do you think about this? This should be solved with better scaling policies in AWS or it is actually a good idea?

Anyway, I don’t have any idea how to do this using distillery and the same code base.

This is a very open thread, I mean, if you have other ideas, questions, suggestions, it would be fantastic and I’ll be glad to collaborate in my experience.

Thank you for your time.

1 Like

:wave:

How do you resize the images?

My problem is if suddenly the app has to deal with several images the API gets slower, much slower, and if there are a lot of images the API does not respond at all and AWS restarts my dockers.

Have you been able to find out what causes the API to stop? If you use nifs to resize the images, the schedulers might become blocked.

You could start by limiting the number of heavy jobs with DynamicSupervisor (or any other pool).

As for using multiple types of nodes doing different jobs: “Designing for Scalability with Erlang/OTP” covers this in chapter 13, they call them “node families”. If you want to do this with Distillery you could put the image processing in a different Erlang application and create two releases, one that includes the image processing app and one that includes the web frontend app. When developing locally you can run both in the same VM, but when deploying to production you would use either of the two releases.

There’s all kind of ways to make them talk to each other, a good old message queue still works, but you can also use distributed Erlang with the pg2 module for instance.

3 Likes

I’m using System.cmd + ImageMagick.

I think the reason is that the CPU is so busy with the image resizing that it does not have time for the API… I guess. I also limited the currency for the queue to 10, but it still seems too much for the server. My Ecto pool is 15, so it should be enough to work without timeouts.

Definitely I have to take a look and learn how to make those two releases.

At the moment they can communicate with the DB and with Redis, they don’t need to “see” each other with Erlang cluster.

Do you know how many of these image resize operations could be running at once? 10 seems like a lot.

I would potentially assign that job its own dedicated queue. Start with 1 concurrency - see if it helps. You can add more concurrency if things stabilize.

Others have already pointed out other solutions but I would like to go on record and say that this should be fine too. There are different ways you could tackle this but I guess the simplest would be to move to an umbrella project and have the Phoenix application as one app and the background processor in another. On one machine you will deploy the Phoenix one and on the other you do the background job.

You could also do two complete separate apps, without the umbrella bit.

You could also keep everything in the same app and use a configuration to specify when the workers are started. This is very similar to how Phoenix works, where it only starts the http servers if the server: true flag is given.

1 Like

I have more than one server running, so it should be distributed among them.

Yeah, at the moment I did this.

OK, so with Umbrella project I could do this. Good to know. I read a lot about Umbrella projects at the beginning, but I didn’t see the point because with phoenix separation of concerns I had enough until now.

It is good to know but tougher.

Thanks to all of you