Putting non-protocol functions in a protocol?

As you know, Elixir protocols dispatch to the particular implementation based on the first argument passed to their functions.

For many use-cases this is fine.
Sometimes, however, it is not.
One such situation is when we want to add a function that creates a datatype in a generic way. At this time we probably know the module name of the implementation we’d like to use, but do not have a struct of that type.

This leaves a couple of different possibilities. I am hoping for some feedback on which technique you’d prefer:

1. Have a separate Behaviour module. Structs implementing the protocol should implement this behaviour for structs as well.

Advantage:

  • Does not need any ‘hacks’.

Disadvantage:

  • It is easy to miss/forget implementing the behaviour
  • It might be confusing that there both is a protocol and a behaviour that together specify the interface that needs to be implemented.

2. Add a function to the protocol which does not actually expect the struct as first argument.

Advantage:

  • Only a single module which defines the interface.

Disadvantage: Seems a bit like a ‘hack’:

  • It requires manual protocol dispatch, which is hackish as since we do not have a struct of the protocol yet (but only a module name), we cannot rely on ProtocolName.impl_for(datatype). Manually concatenating module names currently works, but seems like relying on an implementation detail.
  • It might mess with protocol consolidation.
  • Elixir and/or tools like Dializer or Credo might produce warnings.

3. Using a library-provided ‘extended protocol’

One example of a library providing extended protocols would be protocol_ex.

Advantage:

  • it might be possible to implement this pattern directly.

Disadvantage:

  • It might be overkill
  • Improved developer complexity: It’s a new library that developers need to understand.
  • Circumventing normal protocols will mean that improvements to normal protocols (like e.g. better consolidation) cannot be used.

4. ???

Maybe there are other possibilities as well?


If you need more context, this recently came up here, PR #32 of the Arrays library.

1 Like

There’s already a protocol defimpl and a module with a defstruct :man_shrugging:

The other two approaches both require additional understanding of the details of protocol dispatch, versus having a straightforward empty that calls a function in the struct’s module.

I‘d likely go for a behaviour, which includes all necessary functionality, and have a protocol with fallback to any, where the any implementations defaults to calling the functions of the behaviour on the struct module.

Let me give some extra context. There definitely are situations in which I’d go for @LostKobrakai 's approach, but it cannot be used here.

Arrays and some similar libraries (e.g. okasaki, sets, prioqueue) have a unified interface module (in this case Arrays) which contains some generic code. For some functions this generic code calls a particular protocol implementation.

This pattern is common elsewhere in Elixir. For instance we have an Enum module which contains generic code that internally uses the Enumerable protocol implementations.

The idea is that user code should (only) use the unified interface, and that they can specify in configuration (either in config.exs or by passing explicit options to Arrays.empty/1 or Arrays.new/2) which array implementation they want to use.

This works great, except when actually creating the initial structs (. Here we cannot dispatch to the protocol implementation because we do not have a struct yet. We only have the module name.
And that is where the conundrum lies.
What do we do for this situation?

Adding a function (also called empty) to the module that contains the defstruct as @al2o3cr is indeed the current approach.
To make it slightly more clear that we need this module to implement this particular function we use a behaviour.

However, this means we have ánd a behaviour ánd a protocol, with the pros and cons outlined in the first post above.
We’re looking for ways to make the interface of the library as a whole better.
(For both users as well as implementers of new e.g. array backends).

1 Like

ˋArraysˋ is not the protocol though – ˋArrays.Protocolˋ is. Your public API is not your protocol, and given the constraint you mentioned it cannot be. Same for e.g. ˋEnumˋ btw. The protocol is ˋEnumerableˋ while ˋEnumˋ uses it to provide a nice API. But on the other hand take a look at ˋAccessˋ, which is actually a behaviour (even when often mentioned to be a protocol), which you need to implement to have datatypes be accessible as data containers.

You‘re obviously in a situation where you need both the behaviour version of datatypes providing certain functions as well as the protocol side. What I suggested would at least remove the need of explicitly defining the protocol implementation unless something truly custom needs to happen. The downside to behaviours however is that they cannot be implemented in userland like protocols can.

In the end I question if you actually need ˋemptyˋ on the protocol, because how useful is ˋArray.Protocol.empty(…)ˋ if there‘s nothing to dispatch on. For ˋEnumˋ one would also do ˋMapSet.new(list) |> Enum.map(…)ˋ as well.

1 Like

There are two differences between Enum and Array.

  1. (virtually) all Enum functions change the enumerable into a list anyway, making something like starting with e.g. MapSet.new less useful/common.
  2. The interface exposed by the different array-presentations is identical. It therefore seemed like a good idea to move the choice of implementation from the code into the configuration. In many cases it makes sense to pick a particular implementation project-wide rather than module- or function-specific. Not having to change the code to benchmark/profile the impact of the different implementations in a particular application seems to be an added benefit here.

Currently there is an Arrays module with the user-facing code, the Arrays.Protocol module that defines the protocol and the Arrays.Behaviour module that defines the related behaviour (consisting only of empty).

If that’s the goal, there should be a way to make that decision at compile-time; paying an ETS lookup every time code wants an empty value isn’t going to help performance.

One way to do that would be to create an “instance module” where you specify the implementation and then calls to the right functions get compiled in - see Ecto.Repo for an example of this style. This works as long as the code that’s manipulating these structures is application code; if you’re expecting to be able to use other packages from Hex things get more complicated (since those modules won’t know your instance module).

Protocol already defines a behaviour, so you should just declare that behaviour in an outer module that defdelegates to the protocol module.

If you do the following, you will see that dialyxir/elixir_ls indicates the error “type mysmatch for @callback f/1 in P behaviour”

defprotocol P do
  @spec f(term) :: atom()
  def f(x)
end

defmodule I do
  defstruct []
  defimpl P do
    def f(_x), do: 32
  end
end

defmodule Pr do
  @behaviour P

  defdelegate f(x), to: P
end

moreover, if you do Code.Typespec.fetch_callbacks(P) it will show you the correct callback for the P.f/1 function.

I also just verified that you can add extra callbacks to the Protocol callback if you desire.

1 Like

This is precisely how I have solved this problem, with the addition of callbacks inside the protocol definition.
By the way, what does Pr stand for?

Protocol? Lol. Naming things is hard.

I thought P stood for Protocol :laughing:

Sorry, my bad, Pr stands for “Protocol example”

Correct. The expectation in this particular case is that the arrays are used in algorithms whose performance is governed by the number of elements rather than by the number of arrays (i.e. the number of Array.empty/1-calls being much lower than the other operations).
If this does happen to be slow, someone could perform this wrapping manually to create ‘default options’ which are then explicitly passed, as alternative to ETS.


@ityonemo are you sure that is okay? I always thought that a protocol did not like it if you add functions whose first parameter is not the thing implementing the protocol.

It’s fine. A defimpl is just a module, it can have whatever functions you want. It’s the defprotocol that is special. And note that Pr is not a defimpl, it’s just module that “inherits” the P “interface” contract.

If you only want those extra functions in the “outer” module, then you can make it an optional callback, do the impl modules won’t emit comptime warnings.

First of all I would like to mention that defining callbacks with the @callback directive is discouraged.
While it can still be used, ExDoc will not list these callbacks in the protocol documentation.
See this issue for reference.

The approach that I ended up taking was to define a submodule called Behaviour, where you can end up placing all your @callbacks. There is a caveat though, that the functions that define functions/macros are not available, this is due to the way protocol is implementing disabling these functions in the parent module. One way to solve this is to create a module outside the protocol definition (defmodule MyProtocol.Behaviour).
But for clarity I preferred to keep it as a submodule.
You can see the implementaiton of this here:

Please let me know what you think about this approach

2 Likes

It seems to me like we are treading in unexplored waters, and am eager to learn what people of the Elixir core have to say about this situation.

2 Likes

The one conflict that I can foresee is that if somebody defined a %Behaviour{} struct, it will collide with the submodule we created, given it implements the protocol.

Wait, is the namespace not nested as normal to Buildable.Behaviour?

Yes. So if you do defimpl Buildable, for: Behaviour it will create Buildable.Behaviour.

Which is very unlikely somebody will define a struct with the name of a module that exists in Elixir core and is already deprecated,
But I am just saying it is something to keep in mind when creating submodules under the protocol namespace

1 Like