How do you start thinking about the supervision tree?

a8t · April 17, 2021, 2:59am

Hi, it’s my first post so I will waffle a bit. Skip to the bolded part for my real question.

My four years as a software developer have been spent mostly in NodeJS or the browser.

Learning the wonderful model of Elixir and the BEAM (shout-out to Sasa Juric’s awesome talk) has made me realize that, even though I know how it works at a high level, I almost never think about the Node runtime, for example its call stack, event loop, and so on. Comparatively speaking, I think of React’s runtime quite a bit more, even though it lives at a higher level of abstraction than Node/browser.

To me it seems that the Elixir community is the other way completely. There is very conscious and deliberate care given to every process and its lifecycle. In a very crude metaphor, we are spawning N many "NodeJS"s at a time, and yet we care about each one more than I care about my entire singular Node process. (Maybe that is a fault of my own as a developer.)

Obviously, one piece that doesn’t exist in JS-land is Supervisors and the management of process built into the runtime. It’s a beautiful model, but it leaves me at a loss. I am used to defensive programming with enough try/catches to ensure “safety”. I embrace Let It Crash with open arms, but I’m not yet used to it.

So. My question: When writing an application, when do you introduce a supervision tree? Is it the first thing you do? Is it the last, after a bunch of modules are written and ready to chug? Does it affect the structure of your code in some way?

I’ve read this guide, and I still feel that I don’t understand how Supervisors and Applications are used by developers in the process of creating software. Is every module you write created with the supervision tree in mind from the beginning?

It does seem very fundamental to me, and I believe the reason is that I spend a lot of time writing React code, where we have a similar tree structure to our software (including the ability for nodes of the tree and their children to die off without killing the whole tree). The React component tree is a fundamental part of writing the software at every moment. Is the same true for Elixir processes?

Thanks in advance for the discussion, all. I’m really happy to be here

kokolegorille · April 17, 2021, 5:43am

It’s important to think where You want to put concurrency, and how…

Because there are many tools in OTP, with some Elixir specific abstractions.

Task
Agent
GenServer
State Machine
Supervisor
DynamicSupervisor
GenStage, Flow and Broadway

If You don’t mind reading some Erlang, there is this documentation.

http://erlang.org/doc/design_principles/users_guide.html

A simple unit could be

   Supervisor
    |       |
GenServer - DynamicSupervisor
            |
          Workers

But could be simplified if You use parent library.

mindok · April 17, 2021, 8:19am

It depends on what kind of application you are writing. If you are building a CRUD application with Ecto & Phoenix you really don’t need to worry about it much at all - the frameworks take care of it for you. I suspect you could go a good couple of years being totally oblivious to the otp/BEAM supervision model if you are writing this kind of application.

If you want to, you can take a look in a generated application.ex (e.g. from running mix phx.new) to see the setup of a default high level supervision tree, and if you dig into the referenced applications you can work your way down.

For anything else, then as per @kokolegorille response, you probably need to spend some time studying what the options are as the design of your supervision hierarchy will depend on exactly what you are looking to achieve in terms of concurrency, process lookup & lifetime, fault tolerance, distribution etc. I don’t think there’s really a recipe other than clear thinking. In addition to the parent library from @sasajuric, there’s an earlier article on his blog that has an example of thinking about process boundaries etc: The Erlangelist - To spawn, or not to spawn?

One key point from this article - the “functional core” where the data structures and business logic are defined can be totally separate from the run-time model. In other words, you can write a whole bunch of modules with structs and functions and not even consider the process model. If they are called from Phoenix controllers or liveviews, you still don’t need to worry about the run-time process model as Phoenix takes care of it for you (unless you need to manage non-database type state between requests).

My suggestion would be read widely (over time) and define problems you are looking to solve. If you have the problem well defined and your reading doesn’t answer it, feel free to ask here.

a8t · April 17, 2021, 6:37pm

Thank you both. That 2017 article is exactly what I need, looks like. I will also check out the Erlang docs on the matter.

One cannot ignore that all three of us in this thread mentioned @sasajuric’s work here.

I will quote the points from the article which I think provide a great basis for me to answer my questions:

Use functions and modules to separate thought concerns.
Use processes to separate runtime concerns.
Do not use processes (not even agents) to separate thought concerns.

ityonemo · April 17, 2021, 7:08pm

80% of the time the answer is “don’t”. Someone else, much smarter than you or I, who has domain-specific insight into the operation of the library or framework you are leveraging, has already thought about the supervision tree for you and you are getting that for free with your library/framework. This goes for phoenix, broadway, etc.

80%->95% a good chunk of the time you are starting up tasks. Start up a DynamicSupervisor in your main application supervision tree, and start all of your tasks inside that DS. The only thing to think further about is that if you have a group of tasks that are likely to die together (e.g. they raise when they fail to connect to a common external service, which is going to be highly correlated), make a DS just for them, and isolate their failures from your other tasks.

→ 99%

Are you storing stateful data in a lookup table (could what you’re doing be done by redis)? Consider setting up an ets table, and avoid the supervision tree.
Do you need a singleton process that manages one thing and one thing only? Set up a single static supervisor that launches your singleton process. Beware that this process is at risk of being a bottleneck, since processes are “single-threaded”.
Do you need multiple processes that manages a group of transient, stateful systems? First, double check that you couldn’t use a Task instead. Second, put a registry and a dynamic supervisor in your top level. Write calls (and casts, but i default away from casts) in your genserver that use the via syntax with your registry. Don’t use GenRegistry (I just had to refactor something away from GenRegistry).

Now you are ready to go, and when you are in a scenario where you need to upgrade to horde or swarm you’ll be good to go.