Zero-cost abstraction for NewTypes in Elixir

Fl4m3Ph03n1x · March 17, 2022, 9:06am

Background

Recently I have discovered this notion of “zero cost type wrappers”. Basically what this means is that you can create a new type by wrapping a primitive type (and the cost of doing this is low to non-existent). This new type would serve as an additional layer of abstraction and prevent certain categories of bugs at compile time.

For example, let’s assume we have this function (assume we have an Artist struct):

@spec new(artist :: String.t, country :: String.t, genre :: String.t) :: Artist.t
def new(n, c, g) do
  %Artist{
    name: n, country: c, genre: g
  }
end

Now obviously I added the specs here for help. But you will notice that everything is String.t. This basically means I can incorrectly invoke this function:

MyModule.new("U.S.", "Metallica", "Heavy Metal") # name and country and swapped

The compiler would not complain.

NewType abstraction

To solve this issue, some people came up with this notion of wrapping primitive types into an abstraction. If you are from Scala you may know this as “Zero-cost abstraction for NewTypes”, if you are from Rust you may know it as the NewType Pattern and so on (this is a feature present in many languages these days).

scala code

opaque type Location = String
object Location{
  def apply(value: String): Location = value
  extension(a: Location) def name: String = a
}

This would create a new type called Location that wraps the String primitive type.

In Elixir, our function’s signature would now be:

@spec new(artist :: String.t, country :: Location.t, genre :: String.t) :: Artist.t

(you can also do the same for genre)

Elixir NewType wrappers?

Now, using the power of typespecs I could do something like:

@type location :: String.t()

And use it in my specs. But this would serve merely as documentation and would prevent no types of errors whatsoever.

The closest thing that comes to my mind, would be to define a struct:

defmodule Location do
  defstruct [:name]

  @type t :: %__MODULE__{name: String.t()}

  @spec new(name :: String.t()) :: __MODULE__.t()
  def new(name), do: %__MODULE__{name: name}
end

Ignoring the boilerplate code (we can just create a macro for that!) I think this is the closest I can get to having something like the NewType abstraction.

This would allow us to invoke the function like this:

MyModule.new("Metallica", Location.new("U.S."), Genre.new("Heavy Metal"))

We can’t swap parameters and have thus eliminated a category of errors. Further more, we did this at compile time.

Would it be zero-cost? I don’t think so, since I am replacing a String.t with a map that has 1 key. The overhead would probably be minimal, but I don’t think I could call it zero cost.

Questions

How would you implement this abstraction in Elixir?
Are there any optimizations one could do here?
Is it possible to have a compile time check that prevents this category of errors using typespecs only? (I don’t think so, but please feel free to prove me wrong)

csadewa · March 17, 2022, 9:17am

Hmmm, zero-runtime cost mean somehow the work is done on compile time? i wonder if it’s possible to do this via macro, which run at compile time. but i think in that case the obvious limitation would be that the checking would only work if when the value is known at compile time (static).

LostKobrakai · March 17, 2022, 9:27am

While I can see the usefulness of the general pattern one has to acknowledge that elixir is not a statically typed language. The compiler only has limited knowledge around the types of data at compile time (especially around everything message passing). So the remaining options are runtime checks.

As you noted things won’t be “zero cost” at runtime. The smallest way to add information to a piece of data would be a number per type or on the beam an atom, which at runtime is also basically a number. Erlang usually uses records for that {:user, "someone"}. Tuples afaik come with very little overhead in actual memory over just the two values itself (1). In elixir we usually don’t use tuples/records that much, we use maps/structs. Luckily small maps (<32 keys) are stored in (again afaik) a similar memory layout to tuples. Still a bit more overhead, but less than for large maps.

So those are the options to look at imo for deciding if the runtime hit is worthwhile. Generally I feel like structs are a good way to “type” data, but I wouldn’t do it for every scalar floating around in your system, but rather things which are reasonable entities or values in your system. E.g. Location in an event booking system makes sense. Wrapping every city string probably not.

[1] Erlang -- Advanced

Fl4m3Ph03n1x · March 17, 2022, 9:57am

This is very interesting. But if I were to use a tuple, my signature would have to be:

@spec new(artist :: String.t, country :: {:name, String.t}, ... ) :: Artist.t

Instead of:

@spec new(artist :: String.t, country :: Location.t, ...) :: Artist.t

Right?

I would also need to extract the value via elem/2.

An idea worth exploring though, thanks !

LostKobrakai · March 17, 2022, 10:06am

defmodule Location do
  @type t :: {:location, String.t}
end

But yes you’d have to unwrap the value - just like with a struct. This is runtime data we’re dealing with. You cannot implicitly wrap a string to be tagged “a location” and the runtime would infer that tag from the plain string received. There’s things you can do in a statically typed language you simply cannot do if you don’t have a statically typed language.

Exadra37 · March 17, 2022, 12:09pm

I will try to help, but after so many months out of Elixir my understanding may be cloudy…

I kind of tried to achieve this on my own in the past, but then resorted to use the Domo library.

I also know of the Typed Struct library that I have not tried yet.

Does any of this libs can help you achieve what you want?

Fl4m3Ph03n1x · March 17, 2022, 4:15pm

Thank you for trying!

When I want structs about something, I usually use TypedStruct. Some people I know use Embedded Ecto schemas.

However, here the purpose is different. Even though I am using a struct, my objective is not to make “using structs easier” (like it is with TypedStruct). My objective here is to simply wrap a primitive value into an abstraction that allows dialyzer (or gradient) to complain.

If anything, structs are an implementation detail that I would hide under the hood of a macro

mat-hek · March 17, 2022, 4:31pm

I usually use a single argument that’s a keyword list, map or struct in such cases, like

@spec new(artist: String.t, country: String.t, genre: String.t) :: Artist.t

or

@spec new(%{artist: String.t, country: String.t, genre: String.t}) :: Artist.t

or

@spec new(%ArtistConfig{artist: String.t, country: String.t, genre: String.t}) :: Artist.t

Though it’s more verbose, I find it more readable and Dialyxir is theoretically able to find bugs there

Exadra37 · March 17, 2022, 4:46pm

So, what I use currently to avoid bad data popping in at runtime is this approach:

defmodule TypeIt.Progress do

  use Domo

  @all_states %{
    backlog: "Backlog",
    todo: "Todo",
    doing: "Doing",
    pending: "Pending",
    done: "Done",
    archived: "Archived",
  }

  @states Map.keys(@all_states)

  typedstruct do
    field :state, :backlog | :todo | :doing | :pending | :done | :archived
    field :title, String.t()
    field :since, NaiveDateTime.t()
  end

  def default(), do: new_for!(:todo)

  def next_state(:backlog), do: :todo
  def next_state(:todo), do: :done
  def next_state(:done), do: :todo

  def new_for!(state), do: new!(state: state, title: @all_states[state], since: NaiveDateTime.utc_now())
  def new_for!(state, since: since), do: new!(state: state, title: @all_states[state], since: since)
  def new_for!(state, title: title), do: new!(state: state, title: title, since: NaiveDateTime.utc_now())

  def states() do
    @states
  end

  def all() do
    @all_states
  end
end

For what I understand you want to make it possible only with compile time checks, but from my understanding that’s not possible in the BEAM, but I really hope you find a solution to be only compile time check. Let me know when you find it that I can help you testing it.

The Domo library I use here adds type specs for me when the code is compiled to help Dialyzer to catch as much as possible, and the rest I have to code it by my self to be checked at runtime, but with the caveat that one using the Struct can always bypass what I implemented by not calling the provided functions to create and manipulate the struct.

al2o3cr · March 17, 2022, 5:40pm

Another approach would be to use a 2-tuple with the first element denoting the “newtype”:

defmodule Location do
  def new(name), do: {:location, name}
end

(resemblance to Erlang records not entirely accidental)

Then the callsite looks the same as in the struct case:

MyModule.new("Metallica", Location.new("U.S."), Genre.new("Heavy Metal"))

But the implementation is a little different:

@spec new(artist :: Artist.t, country :: Country.t, genre :: Genre.t) :: Artist.t
def new({:artist, n}, {:country, c}, {:genre, g}) do
  %Artist{
    name: n, country: c, genre: g
  }
end

An additional thought: that signature for new/3 looks a lot like a keyword list without the list-ness.

Named arguments wouldn’t prevent mis-configuration quite as well as types, but would produce a moderately-string error signal since writing artist: params[:country] looks weird

Fl4m3Ph03n1x · March 18, 2022, 9:00am

Mixing your suggestion with @LostKobrakai suggestion, a possible implementation would be:

defmodule Location do
   @type t :: {:location, String.t}

   def new(name), do: {:location, name}
  
   def get({:location, name}), do: name
end

User code:

loc = Location.new("U.S.")

# Instead of Scala's `loc.name` we would do `Location.get(loc)`
locataion_name = Location.get(loc)

In MyModule:

@spec new(artist_name :: String.t, country :: Country.t, genre :: Genre.t) :: Artist.t
def new(artist_name, country, genre) do
  %Artist{
    name: artist_name, country: Country.get(country), genre: Genre.get(genre)
  }
end

MyModule.new("Metallica", Location.new("U.S."), Genre.new("Heavy Metal"))

I can honestly see both options working.
In both cases, we would get a compiler warning (via Dialyzer or Gradient) for calling the function with parameters swapped.

LostKobrakai · March 18, 2022, 9:19am

You can look at Record, which would remove a bunch of the boilerplate and give you a common API.

Fl4m3Ph03n1x · March 18, 2022, 10:35am

I checked out Record and tried to use it, but unfortunately it has one fundamental flaw for this specific use case:

It requires every field has a default value

So for example:

defmodule Genre do
  require Record

  Record.defrecord(:genre, :name)

  @type t :: record(name: :hard_rock | :heavy_metal | :pop)
end

Won’t compile, because :name has no default.
you could argue “Just use nil as a default”, but I really don’t want that to be possible. In this case, for example, a genre can be 1 of three things, nil is not one of them.

However because something like this is possible:

import Genre

# To create records
record = genre()        # this should not be possible 
record = genre(name: :hard_rock) #=> {:name, :hard_rock}

The idea falls apart. In contrast, with the previous approaches, dialyzer would pick up such cases and report them as incorrect.

This is unfortunate, as this is the almost perfect solution for the NewType abstraction I am looking for.

garrettmichaelgeorge · March 18, 2022, 4:23pm

This is very interesting. Is a zero-cost NewType abstraction similar to a value object in OOP?

ValueObject (Martin Fowler)
Value object - Wikipedia

aziz · March 20, 2022, 6:59pm

I thought @opaque types were intended for this purpose. So I tried the following example but unfortunately it seems it’s not working and both Dialyzer and Elixir-LS don’t report a warning:

  @opaque artist :: binary
  @opaque title :: binary

  @spec artist!(binary) :: artist
  def artist!(a), do: a
  @spec title!(binary) :: title
  def title!(t), do: t

  @spec create(artist, title) :: %{name: artist, title: title}
  def create(name, title), do: %{name: name, title: title}

  def test() do
    # Wrong order of arguments, but no warning/error. :(
    create(title!("Title"), artist!("Artist"))
  end

03juan · March 20, 2022, 8:33pm

There was a discussion recently about changing the way dialyser matches on specs. Perhaps it might help you here?

NobbZ · March 21, 2022, 8:39am

Of course not, everything happens in the same module, and the module that defines an @opaque type has access to it’s internals, and is therefore allowed to use any binary without explicitly “converting” it.

Fl4m3Ph03n1x · March 21, 2022, 9:20am

No, NewType is more like a new primitive type, like Strings or Integers. Value objects are a different concept not related to algebraic data types.

Interesting read though!

Fl4m3Ph03n1x · March 21, 2022, 9:23am

That discussion is not about “changing dialyzer”, but more about finding flags to use in order to detect specific error cases.

This discussion is geared more towards finding a cheap way to define a new type in Elixir. Since it is a new type, it means that dialzyer’s default algorithm would always be able to find incorrect invocations, without the need for underspec or overpsec flags.

tomekowal · April 3, 2022, 7:42am

I’d go with @aziz solution. I usually define the API for other modules to consume so it doesn’t hurt that much that the solution does not work in the same module. It works in other modules:

defmodule Title do
  @opaque t :: binary
  @spec new(binary) :: t()
  def new(t), do: t
end

defmodule Artist do
  @opaque t :: binary
  @spec new(binary) :: t()
  def new(a), do: a
end

defmodule Song do
  @opaque t :: %{artist: Artist.t(), title: Title.t()}
  @spec create(Artist.t, Title.t) :: t()
  def create(artist, title), do: %{artist: artist, title: title}
end

defmodule Test do
  alias Title
  alias Artist
  alias Song

  def test() do
    title = Title.new("Title")
    artist = Artist.new("Artist")

    # Wrong order of arguments causes Dialzyer error
    Song.create(title, artist)
  end
end

mix dialyzer produces

lib/zero_cost.ex:24:no_return
Function test/0 has no local return.
________________________________________________________________________________
lib/zero_cost.ex:29:call_without_opaque
Function call without opaqueness type mismatch.

Call does not have expected opaque terms in the 1st and 2nd position.

Song.create(_title :: Title.t(), _artist :: Artist.t())

________________________________________________________________________________
done (warnings were emitted)
Halting VM with exit status 2

Rafał Studnicki used this idea in one of his projects: Rafal Studnicki - The Alchemist's Code: Bringing More Value with Less Magic | Code Elixir LDN 19 - YouTube He goes even further and with those types

However, there is some boilerplate involved and with dialyzer cryptic errors, I don’t see this solution getting too much traction