I often need to abstract some common functionality into a separate module. In cases when this functionality requires keeping some state, I use the following pattern: I keep that state under a separate key in the “parent” module state (usually a GenServer), and to simplify the code I pass “parent” state into the functions of common module.
I’ll give a specific example to make the idea clearer. In the example, there is a GenServer module that handles packets coming from a TCP socket, and there is LineReader module that handles aggregating the raw data packets and extracting lines from them.
Here’s the code (I’ve attempted to omit most of irrelevant parts):
defmodule ClientHandler do
use GenServer
def init(%{socket: socket}) do
state = %{
socket: socket,
lr_state: LineReader.initial_state(),
}
{:ok, state}
end
def handle_info({:tcp, _socket, data}, state) do
state = LineReader.put_packet(state, data)
{lines, state} = LineReader.recv_lines(state)
# ... do something with lines
{:noreply, state}
end
end
defmodule LineReader do
def initial_state() do
%{lines: [], trailing: ""}
end
def put_packet(parent_state, packet) do
lr_state = parent_state.lr_state
# ... process the new data packet, update lr_state
%{parent_state | lr_state: lr_state}
end
def recv_lines(parent_state) do
lr_state = parent_state.lr_state
lines = lr_state.lines
lr_state = %{lr_state | lines: []}
{lines, %{parent_state | lr_state: lr_state}}
end
end
I feel like this code is a little smelly, because it ties into specific field name in parent state. However, it saves a lot of code on the caller’s side and makes the code much cleaner.
Is such approach an anti-pattern? Are there some better alternatives?
It’s a little unclear why LineReader would need access to parent_state. I’d argue that LineReader should only be involved with the state it takes care of and not care that it’s nested within the ClientHandler. Yes this might be a “bit more characters” in ClientHandler, but it’ll be less in LineReader and responsibilities are properly aligned.
It’s a little unclear why LineReader would need access to parent_state. I’d argue that LineReader should only be involved with the state it takes care of and not care that it’s nested within the ClientHandler. Yes this might be a “bit more characters” in ClientHandler, but it’ll be less in LineReader and responsibilities are properly aligned.
That’s probably the part I’m most unsure about. I’ve chosen that approach mostly because of readability, and in addition because of less possibility for an error.
Compare:
{lines, state} = LineReader.recv_lines(state)
with:
{lines, lr_state} = LineReader.recv_lines(state.lr_state)
state = %{state | lr_state: lr_state}
While second version doesn’t need to access parent state, it adds quite a bit of ceremony and makes it easy to actually forget to update parent state field (immutable nature of Elixir might make it very easy to forget the second line and result in a bug).
and makes it easy to actually forget to update parent state field (immutable nature of Elixir might make it very easy to forget the second line and result in a bug).
I’m incorrect here - Elixir compiler will warn about an unused lr_state value, so it won’t be that easy to forget to update the parent state.
I don’t see how update_in or get_and_update_in will be useful here - we actually need to get the lines data out, and update_in only allows us to mutate contents of the map without returning anything.
A super generic answer: be explicit. Implicitness is rarely worth it unless it’s seriously getting in the way. One example of “getting in the way”, subjectively for me, is a lot of artifacts that Phoenix generates should start off being referenced as defaults in other modules / libraries and only put inside your code base when you need to modify them and they are no longer the defaults.
In your case however, it does seem you want to save just a little typing which is IMO not worth.
And this is also very related to feature envy and the single responsibility principle. Simplified: put things where they semantically belong. Which of course is the entire problem, sometimes we have a problem nailing the semantics – and is ironically one of the things we are actually being paid to do, not the code flinging itself.
Smelly code is not clean code. My advise is to make the code not smelly first. We can then help you to refactor the code to be more concise and readable.
These might be signs that the LineReader would be better off as its own process i.e. Task/Agent/Genserver so it can keep track of its own state or rolled into ClientHandler. I’d be curious what else it’s being used for and how else it’s being used.
These might be signs that the LineReader would be better off as its own process i.e. Task/Agent/Genserver so it can keep track of its own state. I’d be curious what else it’s being used for and how else it’s being used.
Yes, I thought about such approach, but feels very heavy-weight. As I understand it you have to link that additional process into the supervision hierarchy, figure out how to pass it’s pid to the “main” process, transition the resources to the additional process (the socket port in this case) - that’s like two pages of code, too much.
or rolled into ClientHandler
It’s used in several places, not only in the ClientHandler, so I wanted to avoid code duplication. In this case it’s not much, maybe 10 lines, but these lines better be carefuly tested.
Imo processes should only be introduced if there are runtime level reasons for them - failure isolation, asynchronous processing, … That is completely separate from code level abstractions, where multiple module collaborate to build up logic.
Yeah, agreed – it’s what I was trying to suss out with “what else it’s being used for and how else it’s being used”. It’d be the code organization by process anti-pattern otherwise.
You can spin up a temporary process “on demand” e.g. Task.async/1 and “if you spawn a task inside a GenServer, then the GenServer will automatically await for you and call GenServer.handle_info/2 with the task response and associated :DOWN message.”
If the data packet processing logic is brittle and the client handler needs to be robust, a nice runtime reason could be failure isolation so that packet processing can fail without affecting the client handler. You’d likely want to only pass the necessary information.