Tasks vs GenServer

venomnert · December 29, 2019, 1:08am

Hey Everyone, hope everyone had a great Christmas and successful boxing day !

Context:

I want to each client to to be able to create send multiple emails. After multiple iterations I have landed on the following designs; furthermore, have settled on Tasks.

Questions:

Just out of curiosity, when would the GenServer be more beneficial than Tasks, based on the design?
The main benefit GenServer provides is the ability to maintain long running state. Apart from that are there any performance benefit that GenServer offers over Tasks?

kokolegorille · December 29, 2019, 1:49am

You have a better control with GenServer, while Task is a process meant to do something, then die.

But if it is just to send an email, You could just spawn a process, then You could ask why a Task over a Process?

One day You might decide to have a better control, a better error handling, or just support back pressure… You might send billions for Christmas, and less the rest of the year. In this case I would not choose Task.

benwilson512 · December 29, 2019, 4:17am

@kokolegorille are you confusing Task with a job queue? A task is literally just a process. It doesn’t provide better control or back pressure, it provides marginally better error handling because it sets some standard values in the process dictionary, but that’s about it. It isn’t heavier weight than a process.

On the note of a job queue, those are often a good idea with something like email. You get retries, concurrency limits, and so on.

kokolegorille · December 29, 2019, 5:58am

I mean backpressure with GenStage, or related. Under the hood, they are specialized GenServer.

Sorry if it was confusing. I wanted to tell every processes are equals, but some are more controlled than the others.

LostKobrakai · December 29, 2019, 10:31am

Tasks are most useful for failure separation or asynchronous execution for a piece of code. If your current process does need to do something, which is prone to fail, possibly even because of things outside of your control, but hopefully in the control of your user, then Tasks are a good way to start. Same if you want to quickly execute something concurrently to your current process, where persistance of the job doesn’t matter. All a task does is basically inline executed code, but wrapped in another process. Only the process starting the task should be interested in the result or failure of the started task – with the only exception being the optional task supervisor. Personally I’d even say a task should not be very long running as well, because it’s unlike a job queue not sticking around in case of it being stopped by external factors.

I’m with @benwilson512 here, that for emails a proper job queue is a better fit, because you can be more certain that the email will indeed be sent. Sending emails are something once the initial webrequest is successful your system is in charge to fulfill.

Tasks imo are more useful for code, where your system is not in charge if there are failures. E.g. take processing an user uploaded file. If the processing fails (and your code works correctly) your system cannot do anything to make it work. The user needs to fix the input from their side.

I’d like to make this a bit more concrete. A task is meant to be an alternative to using spawn, but be otp conform (see :proc_lib), so it can be properly supervised/shut down. With that it’s less likely to leak processes with tasks. A task therfore is more than “just a process” given it implements otp’s messages, but it’s like spawn just meant to execute a piece of code and be done.

venomnert · December 29, 2019, 2:29pm

Very well explained @LostKobrakai! Thank you @benwilson512 and @kokolegorille for your inputs as well

ityonemo · December 29, 2019, 4:02pm

A task is meant to be an alternative to using spawn , but be otp conform (see :proc_lib ), so it can be properly supervised/shut down.

There’s also some elixir-only goodness in there with the :$callers stuff which makes async unit testing with Ecto, Mox, Hound, Wallaby, etc. effective.

In short, “almost always use Task over spawn”.

mgwidmann · December 29, 2019, 4:19pm

One limitation I ran into using Task.async_stream, one slow process will stop the processing of future work. Say you have 1000 items and number 30 takes significantly more time than the rest (as is with email commonly), processing at a concurrency of 8 for example the 39th task won’t be executed because it will be waiting on the 30th to complete. This happens because the Task.async_stream function ensures ordering, using a selective receive against the PID.

Therefore, unless the time to process a particular task has a particularly low std dev, it will be more efficient to build your own solution. This all assumes you’ve got the load to justify of course, otherwise they may seem the same in a low volume system.

axelson · December 29, 2019, 7:35pm

If you don’t want this behavior you can set ordered to false and then the output will not be buffered at all.