Some advice for Elixir programmers.
I was reviewing someone’s Elixir Code yesterday and found a deadlock condition bug in a GenServer implementation. That’s not a big deal, since I’ve created one or two in the past (while not paying attention). However, I was surprised how long it took me to explain the issue to someone that has been coding in Elixir for a number of years now. I had go back to first principles and explain how servers work in Elixir.
So, here’s my Tip:
Learn how to write your own servers with both synchronous (
GenServer.call/2) and asynchronous (
GenServer.cast/2) APIs before using GenServers. GenServers provide a convenient API with lots of functionality behind the scene. However, to avoid potential concurrency issues, you should have an understanding of how they work internally. It will help you reason with the message serialization and know when to use
GenServer.cast/2. Especially, when not to use
If you are still reading this, and have not figured out how deadlock may happen, then I’ll elaborate. Elixir and Erlang message passing is asynchronous, Full stop. When you send a message, the sender does not wait for the destination process to receive and/or process the message. To do something, the caller sends an async message to another process and includes their own pid in the payload of the message and then calls
receive, blocking on a response from the target process.
If you send a sync message using this model (same as a
GenServer.call/2) to yourself (send and receive processes are the same pid), then you will deadlock.
This has some implications on how to design and use API functions and helpers in your GenServers. Here are my rules:
- Never call your public APIs (in the same module) from any call back or helper code
- Only ever use
GenServer.call/2 (to same process) in your public APIs.
- If you want to share code between an API and a helper, factor out the common code into a separate function and call it from each.
- Think hard every time you do a sync call from one GenServer to another First, It could lead to a deadlock, if it results into a sync call back into the original caller. But equally important you may be creating a sterilizing bottleneck that may have negative throughout implications.
i.e. when talking to yourself use GenServer.cast/2.
Very helpful information. I am still challenged with implementing GenServers with what I perceive as beginner level use cases. For example, I want to use a non-named GenServer (the traditional start and get a PID reference) but do not fully understand the model for storing and referencing this PID in a working application.
Is there a reference example you recommend for a model GenServer implementation? I have read the Elixir-lang doc example, but this describes building a GenServer and not necessarily using a GenServer in depth.
Or, perhaps, there is a book I have not yet purchased or one I need to re-examine?
If nothing else, I think your forum topic is a great start for experts to collect a FAQ or an Awesome GenServer list.
I’m not sure that I’ve found a good reference for this, but its been 4 years since I went through that learning process. There are two basic approaches for starting GenServers. First, you can start them from a supervisor that runs at startup. In this care they will be named (which is not your question).
The second approach is to start them from another stateful process (Another GenServer, GenFSM, or GenStatem. In this case you will need to store the pid as part of that server’s state. When I need to have multiple processes access a common GenServer, I will create a “Manager” that uses a
Map to map an id to the PID.
The other I have done is to propagate the pid to all processes that need the pid. But then you need to handle updating that pid if the server restarts.
This isn’t too difficult, you just need to trap exits on the process so you will receive a :DOWN message.
Also, when starting dynamic servers, I don’t usually run
start_link directly, but create an API on a supervisor to start the process.
One thing that you have to watch out for is providing a pid in the start up arguments of a supervised GenServer. When a supervised process restarts, it receives the original arguments provide on its initial startup. So, if you provide the pid of some process and that process has restarted, then the pid will be stale. I remember struggling with this for a while when I was starting out with elixir.
Essentially “strangers” will need a name/registry to find you. A process tends to want to “remember” the PIDs of the processes it creates and any process expecting a reply will have to supply it’s PID (something
GenServer.call will do automatically for you).
Is there a reference example you recommend for a model GenServer implementation?
Have you gone through Introduction to Mix?
Supervisor and Applications discusses named processes.
Yes. I’ve reviewed those articles. I will give them another review per your suggestion. Right now, I am going back through Elixir in Action with a better understanding of my challenges. Perhaps some background context will enlighten me.
Your advice and @smpallen99 advice is very helpful in giving pragmatic patterns for beginners or others confused like me. Hopefully we can sticky this information.
I think one reason for this problem is where you come from when you get to a GenServer. What do I mean?
If you are coming bottom-up so to speak from the basic concurrency with processes, messages,
receive etc, then you know that a GenServer is just a process and the
cast are just sending messages. Then know that doing a
call just sends a message to yourself and so waiting for a reply is just ridiculous.
If however you come top-down then you see the GenServer through its function interface and the concurrent nature is hidden. I am just calling the GenServer should why shouldn’t I be able to call myself. Coming from an OO background and then “seeing” a GenServer as an object strengthens this view.
That the GenServer, and other behaviours, put effort into hiding the concurrency does not help in this respect. Unfortunately you can not avoid the concurrent nature of the erlang/elixir systems.
Though I must add I have seen people who do realise GenServers are concurrent processes still get into a lot of problems when they try to do
calls between servers and find that they block. This I think is more based on not being used to thinking concurrently.