Is mutating "parent" state in a module an anti-pattern?

rogach · May 2, 2024, 5:23am

I often need to abstract some common functionality into a separate module. In cases when this functionality requires keeping some state, I use the following pattern: I keep that state under a separate key in the “parent” module state (usually a GenServer), and to simplify the code I pass “parent” state into the functions of common module.

I’ll give a specific example to make the idea clearer. In the example, there is a GenServer module that handles packets coming from a TCP socket, and there is LineReader module that handles aggregating the raw data packets and extracting lines from them.

Here’s the code (I’ve attempted to omit most of irrelevant parts):

defmodule ClientHandler do
  use GenServer

  def init(%{socket: socket}) do
    state = %{
      socket: socket,
      lr_state: LineReader.initial_state(),
    }
    {:ok, state}
  end

  def handle_info({:tcp, _socket, data}, state) do
    state = LineReader.put_packet(state, data)
    {lines, state} = LineReader.recv_lines(state)
    # ... do something with lines
    {:noreply, state}
  end
end

defmodule LineReader do
  def initial_state() do
    %{lines: [], trailing: ""}
  end

  def put_packet(parent_state, packet) do
    lr_state = parent_state.lr_state
    # ... process the new data packet, update lr_state
    %{parent_state | lr_state: lr_state}
  end

  def recv_lines(parent_state) do
    lr_state = parent_state.lr_state
    lines = lr_state.lines
    lr_state = %{lr_state | lines: []}
    {lines, %{parent_state | lr_state: lr_state}}
  end
end

I feel like this code is a little smelly, because it ties into specific field name in parent state. However, it saves a lot of code on the caller’s side and makes the code much cleaner.

Is such approach an anti-pattern? Are there some better alternatives?

LostKobrakai · May 2, 2024, 5:58am

It’s a little unclear why LineReader would need access to parent_state. I’d argue that LineReader should only be involved with the state it takes care of and not care that it’s nested within the ClientHandler. Yes this might be a “bit more characters” in ClientHandler, but it’ll be less in LineReader and responsibilities are properly aligned.

rogach · May 2, 2024, 7:14am

It’s a little unclear why LineReader would need access to parent_state. I’d argue that LineReader should only be involved with the state it takes care of and not care that it’s nested within the ClientHandler. Yes this might be a “bit more characters” in ClientHandler, but it’ll be less in LineReader and responsibilities are properly aligned.

That’s probably the part I’m most unsure about. I’ve chosen that approach mostly because of readability, and in addition because of less possibility for an error.

Compare:

{lines, state} = LineReader.recv_lines(state)

with:

{lines, lr_state} = LineReader.recv_lines(state.lr_state)
state = %{state | lr_state: lr_state}

While second version doesn’t need to access parent state, it adds quite a bit of ceremony and makes it easy to actually forget to update parent state field (immutable nature of Elixir might make it very easy to forget the second line and result in a bug).

rogach · May 2, 2024, 7:56am

and makes it easy to actually forget to update parent state field (immutable nature of Elixir might make it very easy to forget the second line and result in a bug).

I’m incorrect here - Elixir compiler will warn about an unused lr_state value, so it won’t be that easy to forget to update the parent state.

LostKobrakai · May 2, 2024, 8:26am

You could look into get_and_update_in(), update_in, ….

rogach · May 2, 2024, 8:49am

I don’t see how update_in or get_and_update_in will be useful here - we actually need to get the lines data out, and update_in only allows us to mutate contents of the map without returning anything.

LostKobrakai · May 2, 2024, 8:51am

get_and_update_in can return you a value and the update part could just be a noop if necessary.

rogach · May 2, 2024, 9:13am

I see now. So something like that will work:

{lines, state} = get_and_update_in(state.lr_state, &LineReader.recv_lines/1)

dimitarvp · May 3, 2024, 10:27am

A super generic answer: be explicit. Implicitness is rarely worth it unless it’s seriously getting in the way. One example of “getting in the way”, subjectively for me, is a lot of artifacts that Phoenix generates should start off being referenced as defaults in other modules / libraries and only put inside your code base when you need to modify them and they are no longer the defaults.

In your case however, it does seem you want to save just a little typing which is IMO not worth.

And this is also very related to feature envy and the single responsibility principle. Simplified: put things where they semantically belong. Which of course is the entire problem, sometimes we have a problem nailing the semantics – and is ironically one of the things we are actually being paid to do, not the code flinging itself.

derek-zhou · May 3, 2024, 3:46pm

Smelly code is not clean code. My advise is to make the code not smelly first. We can then help you to refactor the code to be more concise and readable.

codeanpeace · May 4, 2024, 12:12am

These might be signs that the LineReader would be better off as its own process i.e. Task/Agent/Genserver so it can keep track of its own state or rolled into ClientHandler. I’d be curious what else it’s being used for and how else it’s being used.

rogach · May 4, 2024, 6:32am

These might be signs that the LineReader would be better off as its own process i.e. Task/Agent/Genserver so it can keep track of its own state. I’d be curious what else it’s being used for and how else it’s being used.

Yes, I thought about such approach, but feels very heavy-weight. As I understand it you have to link that additional process into the supervision hierarchy, figure out how to pass it’s pid to the “main” process, transition the resources to the additional process (the socket port in this case) - that’s like two pages of code, too much.

or rolled into ClientHandler

It’s used in several places, not only in the ClientHandler, so I wanted to avoid code duplication. In this case it’s not much, maybe 10 lines, but these lines better be carefuly tested.

LostKobrakai · May 4, 2024, 6:45pm

Imo processes should only be introduced if there are runtime level reasons for them - failure isolation, asynchronous processing, … That is completely separate from code level abstractions, where multiple module collaborate to build up logic.

codeanpeace · May 5, 2024, 12:48am

Yeah, agreed – it’s what I was trying to suss out with “what else it’s being used for and how else it’s being used”. It’d be the code organization by process anti-pattern otherwise.

You can spin up a temporary process “on demand” e.g. Task.async/1 and “if you spawn a task inside a GenServer, then the GenServer will automatically await for you and call GenServer.handle_info/2 with the task response and associated :DOWN message.”

If the data packet processing logic is brittle and the client handler needs to be robust, a nice runtime reason could be failure isolation so that packet processing can fail without affecting the client handler. You’d likely want to only pass the necessary information.