Some advice for Elixir programmers.
I was reviewing someone’s Elixir Code yesterday and found a deadlock condition bug in a GenServer implementation. That’s not a big deal, since I’ve created one or two in the past (while not paying attention). However, I was surprised how long it took me to explain the issue to someone that has been coding in Elixir for a number of years now. I had go back to first principles and explain how servers work in Elixir.
So, here’s my Tip:
Learn how to write your own servers with both synchronous (
GenServer.call/2) and asynchronous (
GenServer.cast/2) APIs before using GenServers. GenServers provide a convenient API with lots of functionality behind the scene. However, to avoid potential concurrency issues, you should have an understanding of how they work internally. It will help you reason with the message serialization and know when to use
GenServer.cast/2. Especially, when not to use
If you are still reading this, and have not figured out how deadlock may happen, then I’ll elaborate. Elixir and Erlang message passing is asynchronous, Full stop. When you send a message, the sender does not wait for the destination process to receive and/or process the message. To do something, the caller sends an async message to another process and includes their own pid in the payload of the message and then calls
receive, blocking on a response from the target process.
If you send a sync message using this model (same as a
GenServer.call/2) to yourself (send and receive processes are the same pid), then you will deadlock.
This has some implications on how to design and use API functions and helpers in your GenServers. Here are my rules:
- Never call your public APIs (in the same module) from any call back or helper code
- Only ever use
GenServer.call/2(to same process) in your public APIs.
- If you want to share code between an API and a helper, factor out the common code into a separate function and call it from each.
- Think hard every time you do a sync call from one GenServer to another First, It could lead to a deadlock, if it results into a sync call back into the original caller. But equally important you may be creating a sterilizing bottleneck that may have negative throughout implications.