When to use one-off processes vs long-running servers?

noam87 · July 3, 2016, 9:26pm

Hi, so I’m reading Elixir In Action and on page 159 it hints that spawning off concurrent processes is a code smell.

Scenario: User clicks button to generate a schedule -> a schedule is generated in the background -> an email task sends him an email with the schedule -> at the same time a db task converts this schedule into db-friendly format and stores it.

Approach 1: Have a bunch of long running servers, ScheduleServer, EmailServer, DbConverterServer, all waiting for messages to process in its inbox. User actions simply send a message.

Approach 2: Every time a user does one of these actions, I spawn off a new instance of ScheduleGenerator (which itself spawns the other processes that come after), so that each user request is done in parallel. When the task is complete, the process dies.

Which would be the “elixir way” to do this? What are some rules of thumb for when to choose one approach over the other?

dom · July 3, 2016, 11:52pm

Approach 1 makes sense if schedules have to be processed in order. The point of the process then is to serialize the calls. Otherwise, it’s better not to go through one process, because it creates a potential bottleneck.

I would use a supervised task: http://elixir-lang.org/docs/stable/elixir/Task.html

jwarlander · July 4, 2016, 7:03am

Are you refer to the last part of the page?

“If you can’t make message handling fast enough, you can try to split the server into multiple processes, […] This should be your last resort, though. Parallelization isn’t a remedy for a poorly constructed algorithm.”

If so, I think he just means that exactly; start by optimizing the algorithm itself, making sure each single message is handled efficiently. Then, if the algorithm can meaningfully be parallelized, try to split into multiple processes if need be.

There are other reasons for concurrency, of course. Some things, like incoming web requests, are naturally concurrent. Other things may need the fault isolation that a separate process gives you.

If you were to make a game, for example, it’s quite likely each player’s input would be handled in a separate process. Any fault in the input handling code could then only crash that process, and wouldn’t affect any other player. For a game with room-based navigation, each room might be a process as well - players could affect the state of the room by dropping items, picking them up, maybe destroying things, etc. In this case as well, you’d benefit from the fault isolation; code execution that’s triggered in the room process only affects that room and it’s inhabitants. With the right monitors, crashes in either of those parts can be safely handled with minimal impact.

Email sending is probably a typical thing though where you need to limit concurrency with a pool, or just have a single persistent email sender process, depending on your scale.

sasajuric · July 4, 2016, 7:09am

That’s certainly not what I meant to say The paragraph you’re refering to just warns that if you’re optimizing a sequential piece of code, you should first consider algorithmic and technology-specific optimizations.

The reason I added that comment is because I’ve seen people frequently trying to optimize a suboptimal algorithm by running chunks in multiple processes. While that might improve performance, it is often better to first consider whether algorithm can be improved. For example, if you can reduce complexity from say polynomial to logarithmic, then you should get much better savings.

However processes should certainly be used to run different independent or loosely dependent things. Per your description, the output of e-mailing is not depending on storing to db, so you can do those two things in separate processes. This will not only improve efficiency (the VM might run those things in parallel), but also fault-tolerance (if e-mailing fails, storing to db might still succeed).

When it comes to your question, I agree with @dom’s response. You need a long-running server only if it needs to keep some state or if actions need to be serialized.