How would PragDave separate this code?

Background

After reading Elixir in Action I came with a (slightly modified for my evil purposes) module that I very originally call the ProcessRegistry. Now this module works and is fine, it also has a behaviour defined, but everything is in the same file and it feels tangled to me, specially after reading this post from pragmatic Dave Splitting APIs, Servers, and Implementations in Elixir.

Code

So, following is the file called ProcessRegistry:


defmodule Clint.IProcessRegistry do
  @type via_tuple_type  ::
    {:via, module, {module, {module, any}}} |
    {:via, module, {module, {module, any}, any}}

  @callback lookup(tuple)                  :: [{pid, any}]
  @callback update_with_value(tuple, any)  :: {any, any} | :error
  @callback via_tuple(tuple)               :: via_tuple_type
  @callback via_tuple(tuple, any)          :: via_tuple_type
end

defmodule Clint.ProcessRegistry do
  @behaviour Clint.IProcessRegistry

  alias Clint.IProcessRegistry

  ###############
  # Public API  #
  ###############

  @impl IProcessRegistry
  def lookup(key), do: Registry.lookup(__MODULE__, key)

  @impl IProcessRegistry
  def update_with_value(key, new_value), do:
    Registry.update_value(__MODULE__, key, fn _old_value -> new_value end )

  @impl IProcessRegistry
  def via_tuple(key), do:
    {:via, Registry, {__MODULE__, key}}

  @impl IProcessRegistry
  def via_tuple(key, value), do:
    {:via, Registry, {__MODULE__, key, value}}

  @spec start_link() :: {:ok, pid} | {:error, any}
  def start_link, do:
    Registry.start_link(keys: :unique, name: __MODULE__)

  #############
  # Callbacks #
  #############

  @spec child_spec(any) :: Supervisor.child_spec
  def child_spec(_) do
    Supervisor.child_spec(
      Registry,
      id: __MODULE__,
      start: {__MODULE__, :start_link, []}
    )
  end
end

It’s a fairly small file with 53 lines, but it has a couple of things that bother me:

  1. The Contract and the Implementation are defined in the same file. I really don’t like this.
  2. According to pragmatic Dave, the API and the implementation are mixed together.

For the first point, I don’t really think there will ever be another implementation of ProcessRegistry that doesn’t use Elixir’s Registry. Yes, I could go on an adventure and re-invent the wheel, but what for?

How many of you had to re-implement registry’s functionality for your own projects after Elixir 1.4? What is so bad about the current implementation of Registry (or so lacking) that would make you re-implement it? Short summary, I don’t really think I need a Contract for this, because I don’t see a future where ProcessRegistry will have (or need) an implementation that doesn’t use Elixir’s Registry.

The second point is even more contentious. From what I understand, even without the behaviour, Dave would not approve of my “mixing of concerns” between the API and the Registry module. He would define a module called ProcessRegistry with the API, and then inside a folder with the same name he would create a module with a completely useless name like impl and there I would have the calls to Registry.

I attack this idea based on the fact Dave likes to complain about useless folders and boilerplaty code, but I find that in my specific case, this is what I would be creating - more boilerplate and indirection.

Opinions

  1. Have you read Dave’s small article? What do you think?
  2. How can I improve this module and this code? Let me know !
1 Like

Previous discussions:

1 Like

I have read through both those two discussions before, but I don’t think they pinpoint the issues I bring in this post. This is not about modularization of applications by dividing them in many many small components that you can then link together via a config file, this is about separation of concerns between the API and its implementation and is something I didn’t see discussed thus far.

Even if it was, what I am looking for is some feedback specific to this sample. So, @peerreynders, how would you do it? This is what I want to know - you and the why of you :smiley:

Dave’s opinions here are at odds with established norms in Erlang and Elixir. I’m not trying to rehash the arguments just pointing out that Dave is exploring new territory and proposing things that a lot of people don’t agree with. I’m curious though if you’ve examined your ideas about why you do not like it? I like it because it completely encapsulates the message-passing interface, which is no one else’s business.

They don’t have to be. You could have a public behaviour for your module that defines the interface separately.

I think a registry is provided simply as an example of how these different pieces fit together. I’ve never implemented a process registry.

1 Like

If you just want to start the registry and nothing more you can just do:

  @spec child_spec(any) :: Supervisor.child_spec
  def child_spec(_), do: Registry.child_spec(keys: :unique, name: __MODULE__)

No need for a more complex child_spec.

2 Likes

A behaviour is still useful even if there’s only one implementation. It’s not only a contract which makes switching out implementations easier, but it’s also a contract between caller and implementation, which is more explicit than “oh there’s a public function on the implementation module”. So if you already have a useful behaviour keep it around.

For the one file issue: Just put them in two files and the issue is gone.

The “indirection” of behaviours you mentioned is not in the fact that you have a behaviour, but you get indirection only if you no longer call MyOnlyImpl.via_tuple(key), but you need a more flexible system, where switching out the actual implementation is possible. That part you can skip if you don’t need it. As long as the caller only uses callbacks defined in the behaviour you can add a more flexible system later at any time.

Your module hardly does stuff anyways, so spliting it up is surely overkill. Even if you like the ideas of Dave you’ll always need to make the decision if the tradeoff of more files is worth the greater abstraction. Creating more stuff for the sake of having more stuff is never a good idea. To turn the idea around. If you build a genserver and each callback has multiple hundred lines of state modification, task spawning and return value calculation in it you might be in a place where spliting up state modification, runtime message passing and worker spawning into concrete modules will be way more useful.

In my opinion “separation of concern” is a bad metric. What even is a concern? E.g. in my mind your example module is a single concern, because it bundles stuff needed to interact with a registry. Nobody else needs to know details about it but this module. You seem to not trust your instinct in that it actually is one concern.

3 Likes

“It depends”.

For the case you are presenting a single file works.

But I also have to point out that Dave’s KV “API” isn’t a behaviour module. It simply is a module that represents an API.

You chose to define your “interface” formally via a behaviour module.

The reasons for these choices aren’t based on the same motivations and therefore not really related or comparable. In some ways the question seems to conflate two different viewpoints on APIs.

Dave’s approach

kv.ex
kv/impl.ex

to

kv.ex        ... API (not a behaviour module)
kv/impl.ex   ... data structure module
kv/server.ex ... server

is about speculative generality - “Now I have a data structure that later I may want to turn into a server”.

I assume Clint.IProcessRegistry exists to enable mock-ability.

And more importantly, while a behaviour module can be used to mimic an interface/contract - that isn’t its raison d’etre.

Behaviours were designed for composition.

  • The behaviour module (e.g. gen_server) encapsulates generic capability - i.e. it includes code.
  • The callback module provides specialized capability (more code) to specialize run-time behaviour.
  • The contract between both has to exist in order for the composition to work.

I’m using Dave’s approach in my current project. I really like splitting the API apart from the implementation and the server apart from the code that produces the served data.

Even my largest modules are only about 300 LOC, which I find quite manageable. FWIW, here is the lib directory for one of the smaller apps.

I’m using an umbrella app structure, because Mix works well with this, but I’d be happy to switch over to a component-style approach if and when it is supported by production tooling.

Incidentally, I’d love to find a way to tidy up the @doc attributes for the defdelegate entries. As it is, I have to hand-edit repetitive details to produce a link to the referenced implementation function:

 @doc """
  Return a Map describing the code files.
  ([`...CntCode.get_code_info/1`] (InfoFiles.CntCode.html#get_code_info/1))
  """
  defdelegate get_code_info(tree_base),             to: CntCode

I really like to have the GenServer and the implementation separated from each other.

  • Client
  • ClientState (just handling the mutation of the data structure from the client)
  • Server(GenServer stuff)

Probably I will follow this (I may no need ClientState sometimes) for most projects.

1 Like

I don’t really understand the purpose of this module. In a real project, I’d likely delete this code and use Registry directly.

If you really insist on having this module, the second best simplification would IMO be to remove the IProcessRegistry contract, and use ProcessRegistry directly.

I definitely wouldn’t split this particular code into more files. That would just lead to more indirections which would needlessly degrade the reading experience.

5 Likes

Agree with Dave absolutely

Hello @mark_lemberg and welcome !

Could you elaborate on why you agree with Dave and how you would apply his opinion to the piece of code I provide in the question?

I think looking for universal rules would not ever work and is a distraction in general. I’ll post one of mine though since it proved itself as a time saver many times.

I draw the line at how complex would a file be for me to parse by eyeballing. For most simple(-ish) GenServers I leave everything in one file. This gives me the peace of mind that I can easily find whatever I need for functionality XYZ in file xyz.ex.

It is only when a module and its responsibilities grow big that I’d look into splitting it into multiple files. And even then I’d analyse probable feature envy and refactor into several new modules. Very rarely have I split a singular module into several files (interface / implementation / runtime artifacts).

2 Likes

Another goal of the article seemed to be encapsulation of data inside the process.

In objects the advantage of encapsulation is two-fold:

  • only the object’s methods can mutate the internal data
  • client objects couple with the interface (API), not the structure of the internal data.

In Elixir values are immutable so mutating a data structure is not an issue (though a name can be rebound). However if clients have to access the data structure directly they become coupled.

When it is desirable to treat the data structure as opaque then all access needs to go through module functions to decouple the caller from the internal structure. While this type of “encapsulation” isn’t enforced, it is often good enough.

So in the article it was surprising to me that the API was modified for the server version when in fact all that was needed was to go from:

@type names() :: map()

@spec new() :: names()
@spec lookup(n, k) :: {:ok, v} | :error when n: names(), k: term(), v: term()
@spec store(n, k, v) :: n when n: names(), k: term(), v: term()

to

@type names() :: pid()

@spec new() :: names()
@spec lookup(n, k) :: {:ok, v} | :error when n: names(), k: term(), v: term()
@spec store(n, k, v) :: n when n: names(), k: term(), v: term()
1 Like

I wouldn’t particularly argue against the code you have. I’m not sure I’d have the interface module unless I knew there’d be more implementations, and I’d do the types a little differently.

  @type via_tuple_without_value   :: {:via, module, {module, {module, any}}}
  @type via_tuple_with_value(val) :: {:via, module, {module, {module, any}, val}}
  @type via_tuple                 :: via_tuple_without_value | via_tuple_with_value(any)

  @spec via_tuple(tuple)      :: via_tuple_without_value
  @spec via_tuple(tuple, any) :: via_tuple_with_value(any)

If the main module has just the API, then adding the extra behaviour seems to obscure things to my eye,

To your point about splitting APIs, what you have here is exactly what I describe. The external Registry is the implementation (only without the useless name :slight_smile: ) and this file is the API that delegates to it. Yes, it adds a little along the way, but not so much that it obscures its nature.

What I’m less keen on is the start_link/child_spec magic in here. As far as I can tell, the API code would still run fine if these two were moved into a separate module: in fact the API would run fine any time there is a Registry process called Clint.ProcessRegistry running. So I might do a little refactoring:

defmodule Clint.ProcessRegistry do

  @registry_name __MODULE__

  @type via_tuple_without_value   :: {:via, module, {module, {module, any}}}
  @type via_tuple_with_value(val) :: {:via, module, {module, {module, any}, val}}
  @type via_tuple                 :: via_tuple_without_value | via_tuple_with_value(any)

  @spec lookup(tuple) :: [{pid, any}]
  def lookup(key),
  do: Registry.lookup(@registry_name, key)


  @spec update_with_value(tuple, any) :: {any, any} | :error
  def update_with_value(key, new_value),
  do: Registry.update_value(@registry_name, key, fn _old_value -> new_value end )


  @spec via_tuple(tuple) :: via_tuple_without_value
  def via_tuple(key),
  do: {:via, Registry, {@registry_name, key}}


  @spec via_tuple(tuple, any) :: via_tuple_with_value(any)
  def via_tuple(key, value),
  do: {:via, Registry, {@registry_name, key, value}}

  
  #############################################################################
  
  defmodule Server do
    
    @registry_name Clint.ProcessRegistry
    
    def start_link,
    do: Registry.start_link(keys: :unique, name: @registry_name)

    def child_spec(_) do
      Supervisor.child_spec(
        Registry,
        id: @registry_name,
        start: {__MODULE__, :start_link, []}
      )
    end
  end
end

I haven’t run this, so it might be total BS.

I’m not keen in the duplication of the registry name between the two modules, but I couldn’t quickly come up with an alternative which wasn’t even worse :slight_smile:

The two things I prefer about this solution are:

  1. The API and the server stuff are separate, so there’s no need for a behaviour module, and

  2. The use of __MODULE__ is disambiguated: before it was used both as a process name and as the module passed to the supervisor, which made it harder for my few remaining neurons to work out what was going on.

But…

Honestly, what’s important here isn’t whether a particular structure is right or wrong. The most important thing is that we’re discussing it, and thinking about it.

Nicely done.

Dave

8 Likes

Thank you so much for your criticism.

It is clear from your post I have a long way to learn about giving my opinions, you share yours in such a nice way that in comparison I look like Hitler when sharing mine :smiley:

I quite like the idea of separating the Server from the API, but at the same time there is that pesky indirection argument people like to use against this methodology.

I can see the value in both, I just need to find the line between when it makes sense to use one model or the other.

Once again, thanks for your reply!

IMO, starting required processes is the part of the API of an abstraction, just like constructor is the part of the API of a class, or new is the part of the API of some data abstraction (e.g. Map).

With the proposed code split, in order to use Clint.ProcessRegistry, I need to first start Clint.ProcessRegistry.Server. In my view this makes the usage and the subsequent reading experience more complicated. Having to start Foo in order to use Foo seems more intuitive to me.

Of course, there are some cases where it’s worth splitting the abstraction API, for example if the API becomes too large and covers a bunch of semi-related concerns. In such scenarios, I’d prefer to have the top-level module, e.g. Foo providing the childspec, and then have other API concerns exposed via “submodules”, such as Foo.ApiConcern1, Foo.ApiConcern2, … That way, if I want to use different aspects of Foo, I have to start Foo, which, again, seems more intuitive than having to start Foo.Server.

Either way, the exact approach I’d prefer is IMO highly dependent on the context. In this simple case, given the code size, I’d definitely keep the start API (i.e. the childspec) together with the rest of the API.

5 Likes

Thanks for highlighting this, because I think it’s at the core of much of my thinking.

The Beam is a pretty unique environment, and it makes us think about many things differently. One major difference is in the way we have to think about process lifecycle as well as process functionality.

Take this particular example. We have a global registry with an API (lookup, update, and so on). In most other languages, that would be the end of it. But on the Beam the conventional way to have global state is to have a named process, and so we now have lifecycle code that starts this process.

But I suggest that this code (the start_link and child_spec) are not part of the API of the Registry. They are part of the lifecycle, the scaffolding needed to get the Registry working on the Beam. 99% of the time, it will only be invoked as a tuple passed as a child spec to some top-level supervisor. It never gets called by people using the module.

So that’s at the root of my concern. We mix traditional APIs and BEAM lifecycle management code into the same bucket of functions. And, as a result, I think we often conflate supervision with functionality. But the two are separate: the supervision tree is an expression of process lifecycle, and not process functionality.

3 Likes

I regard the fact that these funs are (usually) not called directly as a mechanical detail. To me they are still first and foremost a part of the same API, because these functions cover one aspect of how we use the abstraction. IMO starting a thing is a part of using the thing, and so it’s a part of the API of that thing.

Separating into two modules doesn’t change that - the Registry docs still need to instruct me to start Registry.Server before I can use Registry functions. So the API surface remains the same, except now it’s split across two modules. Now I need to read docs of both modules because they cross-reference themselves (e.g. docs for Server explain how some childspec/start_link option affects the behaviour of some function in the “API” module and vice-versa). I personally find this confusing.

I can agree that starting might be considered as a separate API concern, so splitting these two concerns might sometimes be valid, but IMO that’s not the case in this simple example. I feel that such split here brings more usage/reading distractions with little to no practical gain.

3 Likes

If it’s a “mechanical detail” then I’d argue that it’s definitely something that isn’t important when reading. It’s just housekeeping noise.

That’s why I have the ability in the Component library to have globals started automatically.

But it’s all just style, and there’s nothing but time and experience to judge which is preferable. And indeed, as you say, there’s no one right answer.

I push these points, though, because the community seems to be very happy to continue to write code as it has been written for the last 20 years, and that sets off little alarm bells in the agile part of my brain. Unless we try new ways of doing things (rather than just talking about them) we’ll never learn if the current way is indeed the best. I don’t want to debate. I want to code :slight_smile:

2 Likes