I was re-reading Elixir in Action yesterday and I came across a curious quote from @sasajuric:
If the code of your error-kernel process is complex, consider splitting it into two processes: one that holds state, and another that does the actual work. The former process then becomes extremely simple and is unlikely to crash, whereas the worker process can be removed from the error kernel (because it no longer maintains critical state).
Separating state into different processes
I get that by doing this you would gain error isolation - if the worker blows up, you don’t lose your state. Great, right?
But why use a clumsy, boilerplaty,
GenServer for the task? I see some issues with this approach:
- The State process can still crash
- Communication with a GenServer is slow compared with using an ETS table
- Your GenServer state process is a single point of access (meaning it is a bottleneck) and a single point of failure.
Is this approach really viable?
I don’t think I have ever seen a case where this makes sense. What I have seen, is people saving state in an ETS table which then has a dump_process that writes the table’s content into disk every X minutes.
This approach is considerably faster than using a GenServer and works wonderfully with concurrent accesses.
- So, why on earth would I save my critical state in a GenServer process? (examples would be appreciated)
- Isn’t using ETS tables a lot better?
disclaimer: the author may have given this hint because he only introduces ETS tables in a later chapter.
I’d say using ETS in an optimization over using a process to keep the state around (GenServer or even Agent). Both are options for the same refactoring though: Pull state out of the workers.
About your points:
- This is nothing an ets table would be better off with. It’s parent can just as well crash as an agent process.
- This is only really a problem if your GenServer is particularly busy. Data needs to be copied in both cases. See 3.
- This needs to be evaluated for your use-case. If your workers are long running processes data might be accessed only once in a while, so keeping stuff in the process state might be just fine. If that’s not the case you can surely put the data in ets and benefit from the concurrent access option.
So, assuming that my state has a low number of accesses per second, and that using an ETS would provide me with the same failsafe resilience as using a process, I would still be better off using an ETS because it would allow me to easily get concurrent access in the future without ever worrying with bottlenecks.
My point is, if you are going to separate state from work, do it with an ETS table as it grants you more benefits on the long term (meaning you already have concurrent access to state when needed).
You’re not wrong, but adding ets to the mix is also one more moving piece in your system and it’s more complex than e.g. a simple Agent, which basically does nothing but read state. Not to mention the write side if you need that as well.
Do note that writing the state into disk is something that would have to be done for a solution using both Agents or ETS tables. An Agent won’t persist your state and neither will an ETS.
Agents VS ETS
Now, is using an ETS more complex? That is a good question. I am not sure.
If you use an Agent you can make tests that check the protocol, meaning you can test if the interaction between different processes is occurring normally. I don’t know Joe Armstrong, but I have a feeling he would like this because this is definitely protocol oriented.
If you use an ETS however, you are (or should be) using the Just Create One pattern (credits got o @peerreynders for the link), which could be a little bit more challenging to test because you need a Mock to replace the ETS (at least that’s how I see it).
Personal (subjective) conclusion
You surely have an argument there, but since I am not an expert in testing protocols and inter-process communications I am not convinced that using an Agent would be that much better.
The main reason why I didn’t mention ETS tables is indeed because they have not yet been introduced in the book at that point.
Using ETS table to store the snapshot of the state is definitely a valid option. If there’s no other logic in the process, then my goto approach is to use an ETS table, for the same reasons you mention.
That said, ETS tables might not always suffice, simply because they have limited support for atomic, isolated operations. So there might be some cases where the state process might need to be a GenServer, and you’ll need to serialize some actions (sometimes only writes, sometimes everything) through it.
So more generally, the part of the book you quoted is about thinking in terms of isolating error effects (which is the name of the chapter), while keeping the state in ETS tables would fall into a “beyond GenServer” category (which is the name of the next chapter )
Any process can crash in a system. Thats not really the point. The point is that a simple KV store in a GenServer would be the opposite of “clumsy”. It would be so simple that the likelihood of crashing is very low.
I agree with your points about ETS allowing for more performance long term. But there’s overhead in having ETS tables lying around so if you don’t need it then its waste. The idea that always using ets is just better than using a genserver isn’t a solid general rule.
In practice we tend to use a combination of the two. A GenServer will start an ets table and process all writes to that ets table. That way writes are applied serially. However all reading is done in the client process. As an aside, this is why the “boilerplate” of a GenServer is useful. All of this logic is hidden behind that “boilerplate” and is transparent to the end user of the api.
Neither situation is that difficult to test if you want to test in isolation. The api can take an optional name argument and look up the process and ets tables with that name. In production you would use the default name. Thats pattern tends to be less egregious than using something like a mock.
As an extra – and more recent – example, @gyson created Ane which combines :ets and Erlang’s recent :atomics module. Haven’t had a chance to use it yet but it looks like a solid compromise between different tradeoffs with a lot of speed and code readability to be gained.
Other options I’d consider:
As others have said, the chance of such simple processes crashing is pretty low – unless you cache things that rely on volatile external APIs or resources (and even then defensive coding can and will save you).
As for caching libraries and primitives, nowadays we have more choice. Evaluate your tradeoffs and take a pick.