A matter of style in polymorphism: what are the best practices to obtain (safe) polymorphism?

MarcoTrevisiol · November 8, 2024, 11:39pm

I would like a piece of elixir code to do something like the following:

defmodule InfoManager do
  def initial_state(queried_info) do
    queried_info
    |> Stream.map(fn i -> {i, atom_to_module(i).initial_value()} end)
    |> Enum.into(%{})
  end

  def query(state, info) do
    info
    |> Stream.map(fn i -> {i, Map.get(state, i)} end)
    |> Enum.into(%{})
  end

  def receive(state, news) do
    state
    |> Stream.map(&update_value_in_state(&1, news))
    |> Enum.into(%{})
  end

  defp update_value_in_state({key, value}, news),
    do: {key, atom_to_module(key).updated_value(value, news)}

  defp atom_to_module(atom) do
    atom
    |> Atom.to_string()
    |> (fn s -> "Elixir.Info." <> String.capitalize(s) end).()
    |> String.to_atom()
  end
end

so that I can define modules like this

defmodule Info.Day_number do
  def initial_value, do: 0
  def updated_value(value, _news), do: value + 1
end

defmodule Info.News do
  def initial_value, do: nil
  def updated_value(_value, news), do: news
end

so that I can use InfoManager like this

state = InfoManager.initial_state([:day_number, :news])
state = InfoManager.receive(state, "irrelevant_news")
info = InfoManager.query([:day_number])
assert info == %{day_number: 1}

I am not satisfied by my code, right now. In particular, I am not convinced by the hack inside atom_to_module. For one, it feels a waste of resources (think about calling thousand of times the function InfoManager.query/1), but also it does not look like an idiomatic way to pass some module to a function.

What are the best practices to obtain (safe) polymorphism in situations like that? Should I use something like behaviours? Should I keep some more state around (like converting all atoms in the first call to the correct module and keep the mapping between the two around)?

mpope · November 9, 2024, 12:11am

I’ve worked on systems that used run-time polymorphism using similar string to atom manipulation to resolve target modules, where we had hundreds of modules. We eventually settled on building an persistent_term cache at startup of

%{{Interface, SpecificImpl} => Interface.SpecificImpl

and abandoning strings pretty much all together in favor of the atoms and tuples.

We had to resolve these modules a rather extreme amount. Using the perf support BeamAsm, the Erlang JIT — erts v15.1.2 for Linux we settled on this being both good enough and better than string to atom construction. I don’t remember the numbers off the top of my head as this was a few years ago.

For your use-case I would highly recommend using perf on your implementation under load to understand if its a bottleneck then coming up with an alternative solution. We only realized this through measuring, and I believe we saw the problem present in eprof/fprof as well.

al2o3cr · November 9, 2024, 7:55pm

Here’s a slightly-different implementation that uses a protocol for polymorphic dispatch:

defprotocol InfoProtocol do
  def key(data)
  def initial_value(data)
  def current_value(data)
  def updated_value(data, input)
end

defmodule Info.DayNumber do
  defstruct [:value]

  defimpl InfoProtocol do
    def key(_), do: :day_number
    def initial_value(data), do: %{data | value: 0}
    def current_value(data), do: data.value
    def updated_value(data, _input), do: %{data | value: data.value + 1}
  end
end

defmodule Info.News do
  defstruct [:value]

  defimpl InfoProtocol do
    def key(_), do: :news
    def initial_value(data), do: %{data | value: nil}
    def current_value(data), do: data.value
    def updated_value(data, input), do: %{data | value: input}
  end
end

defmodule InfoManager do
  def initial_state(queried_info) do
    Map.new(
      queried_info,
      fn mod ->
        empty = struct(mod)
        {InfoProtocol.key(empty), InfoProtocol.initial_value(empty)}
      end
    )
  end

  def query(state, info) do
    state
    |> Map.take(info)
    |> Map.new(fn {k, v} ->
      {k, InfoProtocol.current_value(v)}
    end)
  end

  def receive(state, news) do
    Map.new(state, fn {k, v} ->
      {k, InfoProtocol.updated_value(v, news)}
    end)
  end
end

Usage:

state = InfoManager.initial_state([Info.DayNumber, Info.News])
state = InfoManager.receive(state, "irrelevant_news")
info = InfoManager.query(state, [:day_number])

The only API change is that InfoManager.initial_state expects a list of atoms that can be passed to struct, rather than trying to inflect the appropriate name from a lowercased/underscored atom.

I also added current_value, so InfoManager doesn’t need to know anything about the structs it manipulates besides “they implement InfoProtocol”.

kamaroly · November 10, 2024, 5:32am

@al2o3cr were you able to know why they resolved to custom polymoliphism instead of using inbuilt protocol?

kamaroly · November 10, 2024, 5:33am

Please ignore my question.

MarcoTrevisiol · November 15, 2024, 12:36pm

Thank you for each suggestion of yours, I’ll keep them in mind next time I need something similiar.

@al2o3cr I like your solution, but I am a little unconfortable with the amount of boiler plate required to write an individual Info. I think I could do some macro to fill it, but I think I am too much early in the project to use them.

@mpope In the end I chose something very similar to your solution, with the only difference that I keep the cache inside the “state” variable, as I think it is the simplest solution. Right now I am not concerned with performance, I was just wandering what is the idiomatic way to express this kind of computation.

Thank you all for the precious suggestions, as I my elixir is still very basic.

mpope · November 15, 2024, 3:28pm

Nice. One thing I forgot to mention is that we auto-generated this map at system startup by converting all atoms on the system to strings and pulling out the ones that matched the Elixir.OurInterface module. Could be useful for you to not add each module manually

windexoriginal · November 15, 2024, 7:54pm

There are two solutions to protocol boilerplate:

fallback to any: Allows you to provide a default implementation of your protocol for any type.
derive: Like fallback to any but allows you to create cookie cutter implementations based on the module implementing the protocol and arguments provided to the derive macro call.

My advice would be to stick to these two methods because they can be understood just by reading your code and referring to the language documentation,