Split a long module into pieces - how?

heathen · February 20, 2018, 11:19am

Hello!

I have a real example I’d like to discuss. Please let me know if I can read about it somewhere in docs, I didn’t find any related.

I’m doing an app using channels for the front-end <-> back-end data exchange. The problem is that application connects to the channel topic and have a long events list with different reactions on each of them. It very hard to maintain a long channel module with all of these methods, so I split up the main channel module into logical parts.

The main piece has a simple handle_in(topic, payload, socket) method which in order calls a proceed_incoming(topic, payload, socket) method defined depending on the topic (with pattern matching) in each logical chunks.

At the same time I have a general method to catch all topic\payload pairs wasn’t described in the code which is must be the last one.

      def proceed_incoming(_topic, _payload, socket) do
        {:reply, {:error, %{"status" => "400", "reason" => "bad request"}}, socket}
      end

Because of this last requirement I need to insert a module contained that methods at the very end of my channel module.

Now I’m using @before_compile macro and it works, but is very error prone. For example, now I’ve got the warning below and can’t easily find a corresponding clause:

warning: this clause cannot match because a previous clause at line 1 always matches
  lib/profile_web/channels/customer_channel.ex:1

So the question is:
Is there any other, more convenient way to accomplish my goal or is this my approach is generally wrong and I must go an another way?

Thanks!

Below is example of my “main” channel module:

defmodule ProfileWeb.CustomerChannel do
  use ProfileWeb, :channel

  alias Profile.Sessions
  alias Profile.Config

  # we need `require` each module before use their __before_compile__/1 macroses below
  require ProfileWeb.CustomerChannel.{Application, Step2, Step3, Step4, UnknownMessage}
  
  @before_compile ProfileWeb.CustomerChannel.Application
  @before_compile ProfileWeb.CustomerChannel.Step2
  @before_compile ProfileWeb.CustomerChannel.Step3
  @before_compile ProfileWeb.CustomerChannel.Step4
  @before_compile ProfileWeb.CustomerChannel.UnknownMessage # must be the last!

  def join("customer:lobby", payload, socket) do
    if authorized?(payload, socket) do
      # subscribe to personalized messages
      ProfileWeb.Endpoint.subscribe("customer:#{socket.assigns.connection.customer_id}")

      # send init data
      send(self(), :after_join)

       # please check this as its handle_info(:set_application_timeout) handles is in the ProfileWeb.CustomerChannel.Application module
      send(self(), :set_application_timeout)

      {:ok, socket}
    else
      {:error, %{reason: "unauthorized"}}
    end
  end

  # handle init request
  def handle_info(:after_join, socket) do
    data = %{
                   # skipped
            }
            
    push socket, "init", data
    {:noreply, socket}
  end

  @doc """
  The main handle in function to route all incoming messages to the use'd modules.
  It takes a topic, message and socket and call a `proceed_incoming/3` function
  to handle such a message type.

  Each message handler module must implement this function for the every incoming message
  it awaits.

  Any message begins with "step" must be checked for application data
  
  """
  def handle_in("step" <> _ = topic, payload, socket) do
    check_application(topic, payload, socket)
  end
  def handle_in(topic, payload, socket) do
    proceed_incoming(topic, payload, socket)
  end

  # Add authorization logic here as required.
  defp authorized?(_payload, socket) do
    Sessions.valid?(socket.assigns.connection.session_id) # and Security.access_allowed?(connection)
  end
end

And here is the UnknownMessage module just to explain idea:

defmodule ProfileWeb.CustomerChannel.UnknownMessage do
  @moduledoc """
  This module handles unknown topic/message

  It must be the last @before_compile
  """

  defmacro __before_compile__(_env) do
    quote do

      alias Profile.Config
      alias Phoenix.Socket.Broadcast

      # Must be the last handle_in!
      # Unknown topic or payload format
      def proceed_incoming(_topic, _payload, socket) do
        {:reply, {:error, %{"status" => "400", "reason" => "bad request"}}, socket}
      end

      # handle personalized messages
      def handle_info(%Broadcast{topic: _, event: ev, payload: payload}, socket) do
        push socket, ev, payload
        {:noreply, socket}
      end

    end
  end

end

jeremyjh · February 20, 2018, 11:43am

The idiomatic way to do this is have a long chain of handle_info, each of which calls a function that may be defined in another module. This usage of macros is not at all idiomatic or in my opinion, called for. You need to see all your handle_info in one place so the patterns are clear, otherwise you will get errors like you describe with completely overlapped handlers.

heathen · February 20, 2018, 1:47pm

Thank you for the opinion!

Actually this is the problem: the list will be veeeery long. This is the reason I’ve split the module into pieces: it’s impossible to maintain all logic (including support functions) in one place/file. And it will not be possible to find any overlapping without compiler warning\error messages anyway. It much easier to keep it in a business logic parts: in this example, there are a number of steps of the application request, each step has its own checks and additional data exchange events. So the idea was to encapsulate each piece of business logic into its own module with the defined interface (proceed_incoming/3 in this case).

If I will be using handle_in/3 list, I will need to tight parts more closely, and in case of any interface refactoring it will be necessary to make changes in different places which I would like to avoid.

At the same time I need to check some events (starting with “step”) for the common application data in the payload, and now I call check_application/3 and this function returns an error if checks don’t pass or call proceed_internal/3 without even knowing about what exact step is it calling for.

There actually shouldn’t be any intersections between patterns from different modules as different pieces have their own different pattern sets. But there always might be an error, and this is the case: the pre-compiler can’t say in which exact line of code I have the repetitive pattern. And the problem is that I don’t even know what clause lead to that warning (all tests are green actually).

Of course, if there are no other ways to solve such problems I will have to go with the way you suggested, but it will complicate support and refactoring as can I see it now. But of course I can be wrong in my prediction

P.S. I finally found the reason of the warning: a totally identical function defined in two modules.

benwilson512 · February 20, 2018, 2:06pm

Think of the central handle_in/3 call more less like a place for business logic and more like a router. None of the actual logic for the messages sits there, and in fact you may have additional patterns elsewhere. For example:

def handle_in("foo" <> _ = topic, payload, socket do
  FooMessage.handle(topic, payload, socket)
end
def handle_in("bar" <> _ = topic, payload, socket do
  BarMessage.handle(topic, payload, socket)
end

# elsewhere

defmodule FooMessage do
  def handle(topic, %{"blah" => 1}, socket) do

  end
  def handle(topic, payload, socket) do
  
  end
end

The idea here is that the main handle_in function matches on only as much as it needs to to route a given message to the correct module. That module can do any additional pattern matching you want.

This avoids entirely the macro issues you’re running into, while simultaneously keeping the main channel model at a sensible size.

heathen · February 20, 2018, 2:55pm

Ben, I agree, this is usually what I’m doing.

Why I decided to go the way I described in this project:

I have a common check logic (application_check/3) for some events. And for some - not. application_check/3 is a number of functions with pattern matching and guards, they check payload for part of its data along with the current state. After that checks and state changes it just calls proceed_incoming/3 and depending on the current topic the call is going to the function from the corresponding part.
Some work has to be done after reply, like:

def proceed_incoming(topic, payload, socket) do
  send(self(), {:validate_info, payload})

  # assign part of payload to the socket
  # and/or proceed payload

  {:reply, {:ok, %{}}, socket}
end

# info validation takes time
def handle_info({:validate_info,
                   %{"series" => series, "number" => number}
                 }, socket) 
do
  # do background job
  Task.start(fn -> 
               send(self(), {:info_validation_result, External.Service.call(series, number), series, number})
             end)
end

def handle_info({:info_validation_result, :error, series, number}, socket) do
  push socket, "information invalid", %{"series" => series, "number" => number}
  {:noreply, socket)
end

def handle_info({:info_validation_result, :ok, series, number}, socket) do
  push socket, "step X finished", %{}
  {:noreply, save_info(socket))
end

And other things like this. So it is not just a sync call to another module functions. Some steps has additional events (like in-form async checking before form submitted and so on). In the future there will be calls to an external back-end with async answers and specific topic broadcasts with interceptions.

Of course, all of this can be done separately, with own return value checks, GenServers and IPC but I think it will require more effort to code and support. What do you think, what benefits can I gain of this additional complexity?

benwilson512 · February 20, 2018, 3:00pm

Clearly you’re dealing with a complex problem, so it’s a bit hard to provide a full architectural comparison because there’s a lot of specifics involved.

What I’ll say is this though: tracking logic across multiple modules is hard, tracking logic across multiple modules that are actually just injecting code into one module is extremely hard. Determining clear boundaries between regular code modules can be challenging but I assure you’re better off for doing it, particularly compared to simply moving the code around in macros.

It sounds like your real issue is that all the code bits are entirely too coupled. If they are truly separate then it shouldn’t be this hard to have each part work with a clearly defined API (whether sync or async).

Does this even need to all be done in a single channel?

heathen · February 20, 2018, 3:23pm

I’m trying to find the happy mean between complexity of the back-end and the front-end. More channels means more complexity on the front-end. In this very example: this an application form contains a number of steps. If I go with a single channel per step it will be necessary to join/leave channels on the front-end depending on the current step, with all error handling and so on. For now the front-end just subscribes to the events it is generally expecting (like finished steps) and depending on the current step subscribes/unsubscribes to/from the additional in-step events. And I’ve tried to put all coupled logic together, so no one injected module calls functions from another. The only exception is CustomerChannel.Application part (and the Channel support itself, of course). Actually, I think if would compiler able to deal with injected code (like show error places) it would be pretty easy to deal with it.

I was thinking about splitting channel to the different topics, but haven’t got any decision yet. May be this would really be a better solution and I’m just overthinking complexity of this way.

yurko · February 20, 2018, 4:38pm

I do it in a similar way but without macros. Here’s the simplified main handle_in:

  def handle_in("topic1:" <> msg, payload, socket),
    do: Topic1.process_message(msg, payload, socket)

  def handle_in("topic2:" <> msg, payload, socket), 
    do: Topic2.process_message(msg, payload, socket)

  ... more like that

  def handle_in(_message, _payload, socket) do
    {:reply, {:error, %{reason: "unknown message"}}, socket}
  end

so the main handle_in handles its topics but not the messages and also handles wrong topic in the end.

Then each topic handler is like a separate channel and it handles it’s messages completely:

defmodule Topic1 do
  
  alias Some.Service

  def process_message("msg-without-prefix", %{"param" => param}, socket) do
    result = Service.do_something(%{param: param})
    {:reply, {:ok, %{result: result}}, socket}
  end

   ... handle more messages

  def process_message(_message, _payload, socket) do
    {:reply, {:error, %{reason: "unknown message"}}, socket}
  end
end

So it’s basically like having multiple channels while technically there’s only one, this way you have some duplication but it can be viewed as encapsulation - everything that is related to the topic is done in the handler module (which delegates to a context or a service, kinda like a controller).

IMO this is simple enough as it is, even for bigger APIs. You test your channel services separately and test each topic handler for its unknown message handler so human error should be pretty improbable.

heathen · February 20, 2018, 5:05pm

Thanks for your example!

The problems in this approach I’ve described above here. In a few words: sometimes I need async reactions: immediate answer to the message and a delayed push afterwards; so with this option I would be forced to split logic into different modules.

yurko · February 20, 2018, 5:10pm

Well, that’s one way to contain complexity, why not?

benwilson512 · February 20, 2018, 5:58pm

The @before_compile approach also does this. Splitting stuff up is a premise of the whole conversation:

The debate has been focused largely on how to split stuff up: via macros or via composed function calls. I’m not entirely sure I understand the current objection to using multiple modules.

heathen · February 20, 2018, 8:40pm

Sorry for my unclear sentence. I have meant that I will be forced to split a logically linked code into different modules. Now I have all handling code for the linked set of events (application request step) - incoming message handling, validation, error processing and reaction - in one module which in turn is injecting into the main Phoenix.Channel module via @before_compile. For the other set of events - in another, and so on. Actually I can easily make a complete channel module from any of my injectable modules by simply add use ProfileWeb, :channel clause and join/3 function and rename proceed_incoming/3 to handle_in/3. I can (and I do) test each part separately: I have a dedicated test module for each of my injectables. They run against the main channel module, of course, but if I will comment out all other modules injections - tests still be green. Here is an injectable module real example (probably I will need to refactor it, but it works).

So if I go with totally separated modules (not the injected ones) I will need to split this very tight “per topic” logic into pieces and put these pieces into different places. I will need to handle IPC, remember all message routes between modules and processes (please check my simplified example above) and so on. And in case of refactoring it will be necessary to do it in at least two places.

To clarify: of course all business logic for the data processing itself (like working with DB and external services, data manipulation and so on) is in the other modules and applications.

yurko · February 21, 2018, 9:44am

I looked at your module, IMO you don’t have to couple it with channels and splitting it into a separate server module is a good idea. You can handle async actions in the channel module by handling messages from the service module - it can do all the needed processing / validation and send the result message back to the channel process which can then pass it to client. This way your service server will only have to deal with processing data and sending response back.

heathen · February 21, 2018, 7:43pm

I would like to thank you again for your time!

Could you please share your vision: what advantages would I get in case I split every injectable module into the channel part and an additional one part - in my and any similar cases?

I see the next disadvantages:

it will be necessary to track and sync two parts: the channel part and the module part. So in case of any refactoring it will be necessary to change both parts, and do it carefully;
the current implementation is tightly coupled with the channel state, and I don’t need to keep the state between channel sessions. At the same time it is necessary to call external systems [for the sake of incoming data validation] and get results back to the channel session to make a decision are they good enough to proceed (externally, of course) or is it necessary to inform a client if they are wrong. In the future external systems will return the data manipulation and\or request results to deliver it to the client. If I would do that in an external module, I need to start an additional process for the each module on the channel session start and finish it on the end. I already have a GenServer with its own state as the Channel process. Is it a good idea to make a number of additional processes?

All of these add an additional complexity, it will demand more efforts to develop and support, and I don’t currently understand for what is it trade off.

If we’ll look at it more generally: we have an MVC, a View part is a client, a Model part is outside of the channel. And the channel itself - is a controller. At the same time channel sessions are stateful and the client-channel data exchange flow depends on the session’s current state. What is the reason to produce additional entities, to sync state between them, to develop IPC interfaces and so on if they will have the same lifetime as the channel, and the channel already has all things to solve the task of getting data and validate it before send it further?

yurko · February 22, 2018, 2:04pm

This separation would make you define clearer internal borders. I agree with your controller analogy, it’s the same here: at the moment you have a big module that handles WebSocket communication as well as passport data processing, if you split them, you’d have one module that handles WS communication and is unaware of any further logic and another one for passport data processing (possibly delegating some tasks to more specialized services).

In this case you’ll have to think in terms of “what is the minimum required input that the module must receive from the channel, what is the end result that it must deliver and what data may be considered internal” which in the long run will lead to leaner modules that are easier to unterstand, maintain and refactor (contrary to your concerns).

As for the performance concerns, I wouldn’t worry about that just yet unless you expect really significant load. In any case processes are cheap and it might be the message passing that will become a bottleneck which is already the case since you send a lot of messages and use tasks. By doing more in external process and only communicating input / results you might actually win on performance but I’d do some load testing and determine the areas that need optimization (if any) first.

PS please keep in mind that I do not mean to know what’s better for you, you know you app better then anyone else. Since you started this topic and asked for external opinions, you got mine, hope it helps

heathen · February 25, 2018, 11:51am

That is an interesting question, actually even a few of them.

Do you think that incoming data validation must be accomplished outside of the controller (in general)? Incoming data is tightly coupled with the ‘View’ part, their future processing - with the ‘Model’ part. So I was always thinking that the reason controller exists - is to provide an interface between View and Model (roughly speaking), to untie the presentation part from the storage\processing part.

And in case of the channels: do you think it is better to have only join/3 and handle_in/3 methods and put everything else outside of the channel module? Isn’t an actual WebSocket communication hide under the hood and the Channel module aimed for data “translation”?

The thing with channels is that they are stateful (unlike http session controllers), and they must keep the state to properly react to the incoming requests. So all data manipulation in my example above is just to check incoming data integrity, keep [in the state] parts important for the future requests\replies and set additional state flags needed for the proper reaction. If I move all of these out of the channel module what is the reason to have the channel state at all? One of the previous questions is still alive: do you think it is better to create\destroy additional states (with GenServers, as it is necessary to start time consuming operations and have resulting data back) in additional modules even if they have the same lifetime as the channel module (which already has the state environment)?

So the question is actually in the boundaries: how narrow or wide should they be. I totally accept and acclaim the idea of the separation and boundaries, and sometime even too much But still trying to fit the idea into the common sense for myself. In this case I’ve tried to group and separate code by different type of messages while keeping simplicity; probably in a wrong way.

Sure, I really appreciate you are sharing your valuable opinion and the point of view, it is very helpful and useful!

yurko · February 26, 2018, 2:30pm

Do you think that incoming data validation must be accomplished outside of the controller (in general)? Incoming data is tightly coupled with the ‘View’ part, their future processing - with the ‘Model’ part. So I was always thinking that the reason controller exists - is to provide an interface between View and Model (roughly speaking), to untie the presentation part from the storage\processing part.

I consider controller less abstract, not MVC level but rather protocol specific. So Controllers have to do with translation / passing the data, but only from HTTP requests and to HTTP responses. More abstract actions should take place in a more “context agnostic” places.

Say there must be no way you’d ever want to “reuse” your controller stuff in your CLI layer or while receiving Data from a non HTTP external Source (like MQ) or while parsing an uploaded CSV etc. All your controller does is calling services that could also be called in these other contexts, the usage of data validation is a good example.

The thing with channels is that they are stateful (unlike http session controllers), and they must keep the state to properly react to the incoming requests. So all data manipulation in my example above is just to check incoming data integrity, keep [in the state] parts important for the future requests\replies and set additional state flags needed for the proper reaction.

HTTP emulates state with sessions, there is no huge difference to channels on this level - they handle WS just as controllers do with HTTP (obviously in a more stateful way). Channels should handle communication and call external services that do any kind of logic, if async is required then via message passing (you use channel’s statefullness this way).

Isn’t an actual WebSocket communication hide under the hood and the Channel module aimed for data “translation”?

Channel does hide WS communication, just as Repo hides DB communication but in bigger apps you build abstractions on top of Repo.

So the question is actually in the boundaries: how narrow or wide should they be.

That’s an eternal question, like naming things - there is probably no “one size fits all” answer In a small app it’s ok to do stuff in controller or channel. In a bigger it’s probably not cool and since you ask these questions yours might be big enough