Help understanding protocols and behaviours

Hello everyone can someone please explain to me how do protocols and behaviours work in elixir like I was five years old? I read some article from dashibit but still dont get it. (+1 question) In OOP (my background) we have interfaces and I can ask for a argument that implements specific interface is that a way to to the same in elixir? for example

def hello( animal = %__BEHAVIOUR__{} ), do: sound

Or any other way?

Imagine you have a baking machine, which can bake cookies. That machine allows you to provide a custom sprinkle application system as well as a custom system for cutting out the cookies into their final shape.

A behaviour would be used to document that the module you provide to the baking machine has a function (callback) for running the sprinkle application as needed, as well as the cookie cutting as needed.

You cannot check if a module actually implements a module. The baking machine would just try to initialize e.g. the sprinkle application and fail if it’s missing. Often this is called ducktyping.

Now imagine an oven. An oven can bake many things, but those things all need to be properly prepared for baking. Muffin dough need to go in the muffin tray, a cake dough in a cake ring and the stew in a firesafe pot.

A protocol allows the baking logic to define “If something needs to get baked it needs to provide proper preparation instructions”. Then implementations for muffin, cake and stew can implement that protocol, so baking works without the baking logic needing to know how things are prepared in advance.

Protocol implementations can be checked when protocols were consolidated, but I’d argue you usually don’t want to do so as well.

Both are used for polymorphic code execution, but behaviours are around outer code receiving an implementation explicitly. Protocols are about polymorphism based on the type of data at hand.

5 Likes

Behavriours → interfaces in OOP with the difference that your function won’t pattern match on a behaviour but a module that has the callbacks defined in the behaviour.

e.g

defmodule A
  @callback do_a()
end
defmodule ImplementationOne
@behaviour A

def do_a(), do: "Implementation 1"

defmodule ImplementationTwo
@behaviour A

def do_a(), do: "Implementation 2"

What you would pattern match on would be ImplementationOne and ImplementationTwo instead of the behaviour

Protocols are about different implementations of a method depending on the type

So, if we have a protocol like:

defprotocol Size

def size(item)

When you implement the protocol for a string, the length function should return the number of characters in the string. But this implementation is not valid for a list. The implementation for the list should return the number of elements in the list. Likewise, this implementation is not valid for maps, for maps we want to return the number of keys in the map.

So, we have 1 method (size) and we have a different implementation based on he type (module)

3 Likes

Behaviours are about modules, Protocols are about data.

Behaviour

Is a list of functions that a module has. It’s useful when you’re doing something and need to say “now I need you to do your bit”.

An example of a behaviour is Plug. When running middlewear on a webserver, it says “OK, now you do your bit to handle this request” for each plug that’s been set up.

You implement behaviours by defining the required functions in a module, and use them by passing that module name into something that expects to be able to call the functions in the behaviour.

Protocols

Is way of doing something with a type of data. It’s useful when you have some data and need to say “I want to be able to do X with this”.

An example of a protocol is String.Chars. It says “this is the way to convert data to a string”. This is used by to_string.

You implement protocols by declaring the implementation for a specific datatype, and use them by passing data into a function that expects to be able to use the protocol for that data.


Here’s your example implemented with a behaviour:

defmodule AnimalBehaviour do
  @callback make_sound() :: atom()

  def hello(callback_module), do: callback_module.make_sound()
end

defmodule Dog do
  @behaviour AnimalBehaviour

  @impl AnimalBehaviour
  def make_sound, do: :woof
end

And with a protocol:

defprotocol AnimalProtocol do
  def hello(data)
end

defmodule Cat do
  defstruct []

  defimpl AnimalProtocol do
    def hello(%Cat{}), do: :meow
  end
end

Usage:

iex(1)> AnimalBehaviour.hello(Dog)
:woof
iex(2)> AnimalProtocol.hello(%Cat{})
:meow
9 Likes

You are on the right track thinking of behaviours and protocols combined as kind of like interfaces (or perhaps abstract classes). There is a separation in FP because behaviour and data aren’t combined. Behaviours define what functions can be called if we are passed a module; Protocols define what functions we can pass a struct without caring about its type (so long as it implements the protocol).

I’m not sure if this will be more confusing but another way to think about protocols, which you can see clearly in @adamu’s example, is they allow you to define a function implementation in one module that will be called by another another.

In practice I don’t think protocols are necessarily that useful in application code, though it very much depends on how you want to organize your code. As the guides call out, if you aren’t writing a library and aren’t concerned with extensibility, you have the option of keeping all the various implementations for different types (structs) in one module, ie, grouping by functions instead of data. The option to use protocols is certainly there, though!

A real world example I have is that I have some structs that can have an image associated with them that I want to store on S3. I want to be able to pass these structs to a function to convert them into an S3 path. Using a protocol you could define these in the structs’ modules themselves:

defprotocol Storage do
  def to_path(struct)
end

defmodule User do
  defstruct [:id, :name, :email]

  defimpl Storage do
    def to_path(%User{} = user) do
      "users/#{user.id}.jpg"
    end
  end

defmodule Product do
  defstruct [:id, :slug, :title]

  defimpl Storage do
    def to_path(%Product{} = product) do
      "products/#{product.slug}.jpg"
    end
  end
end

However, since I own all this code, I just define them in one module:

defmodule User do
  defstruct [:id, :name, :email]
end

defmodule Product do
  defstruct [:id, :slug, :title]
end

defmodule Storage do
  def to_path(%User{} = user) do
    "users/#{user.id}.jpg"
  end

  def to_path(%Product{} = product) do
    "products/#{product.slug}.jpg"
  end
end

Both these implementations result in identical callsites: Storage.to_path(user). I don’t particularly think one implementation is staggeringly better than the other. I personally find the non-protocol version simpler and I have all my S3 pathing stuff in one place, but there are arguments against it too. As always, YMMV.

As for behaviours, they are incredibly useful in application logic. They can be used for implementing the strategy pattern, for example.

4 Likes

Nice, I think I’d probably do the same and just define the functions in one module.

Just one more question, do you guys usually declare these bahaviours all in one folder? for example

  • bahaviours
    • context
      • module.ex

I tend to prefer

.
└── context
    ├── caller
    │   └── behaviour.ex
    └── caller.ex

if the Context.Caller.Behaviour module (or main Context.Caller module that invokes the behaviour most often) is too large to want to be inlined in the caller.ex file.

But I do co-locate them in the same file often, as well, with the behaviour at the top of the module, as you are not required to implement a single module per file in Elixir. This only becomes a problem if two co-located modules in the file need to call each other in certain ways, which is not normally an issue with behaviours, but can be with protocol impls.

As a completely shallow example, my code often looks like:

# zoo/animal.ex

defmodule Zoo.Animal.Behaviour do
  @moduledoc false
  @callback make_noise()
end

defmodule Zoo.Animal do
  def listen(animal_modules) do
    for animal_module <- animal_modules do
      animal_module.make_noise()
    end
  end
end

Then anyone can define an animal module with any name, whether or not it belongs in or is currently in our zoo, implement make_noise, and pass it into Zoo.Animal.listen/1.

I usually go for *Api or *Contract suffix for behaviors. Though of course it depends, sometimes a plain unassuming name is better.

Protocols are there in Elixir mostly kind of because they have to be there IMO.

As @sodapopcan showed you, it’s much more readable to utilize pattern matching in the function that must accept several different shapes of input data.

You need protocols when the code using the protocol doesn’t/can’t know all possible implementations in advance.

Sure. I’ve very rarely seen that scenario in practice though.

Exactly, I didn’t even know in details about protocols until lately, never needed them and most probably will never need them for anything.

1 Like

Do you encode JSON with Jason? You’re using protocols. Every interpolated a decimal into a string? Protocols. Ecto queryables? Protocols. Phoenix params, etc…

I agree that they shouldn’t be people’s first choice most of the time, but they do play a critical role in the ecosystem.

This is true without a doubt, but I don’t think they fit naturally in the ecosystem for some reason, maybe I am missing something.

1 Like

There is no need to define protocols yourself if you are working elusively in application code (maybe if you make heavy use of microservices, but not even sure it’s a particularly good idea then… I really don’t know).

They do play a hugely useful role in the ecosystem, though. Take Phoenix.Param as a small example. Protocols provide a simple way for us to change how ~p acts on our schemas.

I am late to the party, but I still remember how I struggled to grok the difference, so I’ll add my 3 cents.

Superficially, those concepts are pretty similar: both allow to define a set of functions that we can attach to module/type. However, the devil lies in the details.

In programming, it is often true that the more restrictive a thing is, the more powerful it becomes, because it can provide more guarantees.

Protocols are the more restrictive and therefore more powerful of the two.

The restriction is that all the functions in the protocol need to start with the same data type.
The super power is that you can define the implementation separately from the protocol definition and type definiion.
E.g. authentication library can define %Account{} struct, Jason defines Jason.Encoder and you in your application can define implementation of Jason.Encoder for %Account{} struct without touching source code of either of them.
This trick is heavily used to decouple stuff.

Phoenix does not know anything about Ecto and can be used without it.

Ecto does not know anything about Phoenix and can be used without it.

However, Phoenix defines a couple of protocols like Phoenix.Html.FormData that is implemented for Ecto.Changeset in a totally separate repository phoenix_ecto. This makes Phoenix and Ecto work wonderfully together without coupling them.

Behaviours are the less powerful, but more generic of the two.

They are defined on modules instead of data types. We could imagine Jason.Encoder behaviour defining to_json function. We could implement it for a module that defines our struct, but we could not do it for externally the %Account{} struct from authentication library. This is the power loss in comparison to protocols.

However, we can define functions in a behaviour however we want. We are not restricted with all functions taking the same data type as their first argument. This allows us to build machinery that does some common operations and differs in details. E.g. GenServer always does the same juggling with messages and timeouts and the actual business logic that is interesting is defined in callbacks.

Behaviours are often used for explicitly mocking modules:

  1. Define a behaviour
  2. Make production code implement it
  3. Make mock code also implement it

Another example could be a generic HTTP client, that performs calls, retries and handles failures in a common way, but you can provide different request data and parsing responses in callback modules.

Summary

Use protocols when:

  • you have a set of functions working on a particular data type
  • you want to allow adding more data types

Use behaviours when:

  • you have a set of functions (this time they don’t have to have anything in common)
  • you have some generic logic that calls those functions.
1 Like