Will the new type system allow defining Sum types?

matthias_toepp · March 1, 2024, 9:11pm

I’m new to Elixir…having looked into languages like Haskell, Elm, and Gleam, I’m wondering if Elixir’s new type system will permit sum types to be declared and used with exhaustiveness checking? (Presumably, new syntax would be required for this.)

dimitarvp · March 1, 2024, 9:32pm

Oh, I would very much like to know as well.

al2o3cr · March 3, 2024, 12:35am

Why? I’d expect they’d be written in the BEAM “tuple with a leading atom” idiom. A complex example, from gen_statem:

action() =
    postpone |
    {postpone, Postpone :: postpone()} |
    {next_event,
     EventType :: event_type(),
     EventContent :: event_content()} |
    {change_callback_module, NewModule :: module()} |
    {push_callback_module, NewModule :: module()} |
    pop_callback_module |
    enter_action()

Exhaustiveness checking is mostly a difference of interpretation; if Dialyzer sees code like this:

case some_function_that_returns_action() do
  {:postpone, p_value} -> ...
  :pop_callback_module -> ...
end

it will assume that some_function_that_returns_action can’t return the other variants of action and try to derive a contradiction. It will also complain about dead code if subsequent analysis proves some_function_that_returns_action never returns :pop_callback_module.

billylanchantin · March 3, 2024, 2:34am

I’m wondering if Elixir’s new type system will permit sum types to be declared…

I don’t believe so. A sum type is a concept from algebraic type systems. Elixir’s type system will use set theoretic types. Briefly,

Algebraic types have:

Sum types
Product types

Set theoretic types have:

Intersection types
Union types
Negation types

In both systems, you build more complicated types from the respective primitives. But although there are similarities, the way you compose the types are different.

However,

… and used with exhaustiveness checking?

From the technical paper:

https://arxiv.org/pdf/2306.06391.pdf (p. 11)

Exhaustivity Checking Type analysis makes it possible to check whether clauses of
a function definition, or patterns in a case expression, are exhaustive, that is, if they
match every possible input value. For instance, consider the following code:
$ type result() =
%{output: :ok, socket: socket()} or
%{output: :error, message: :timeout or {:delay, integer()}}

$ result() -> string()
def handle(r) when r.output == :ok, do: "Msg received"
def handle(r) when r.message == :timeout, do: "Timeout"
We define the type result() as the union of two record types: the first maps the atom :output to the (atom) singleton type :ok and the atom :socket to the type socket(); the second maps :output to :error and maps :message to a union type formed by an atom and a tuple. Next consider the definition of handle: values of type %{output: error, message: {:delay, integer()}} are going to escape every pattern used by handle, triggering a type warning:
Type warning:
| def handle(r) do
      ^^^^^^^^^
    this function definition is not exhaustive.
    there is no implementation for values of type:
      %{output: :error, message: {:delay, integer()}}
Note that the type checker is able to compute the exact type whose implementation is missing, which enables fast refactoring since, as the type of result() or the implementation of handle are modified, the type checker will issue precise new warnings to point out the places where code changes are required.

So although there are no sum types, I believe there will be exhaustiveness checking.

Disclaimer: I have no inside knowledge! I’m just following along as best I can.

matthias_toepp · March 3, 2024, 1:14pm

@dimitarvp @billylanchantin
Thanks for helping me to get a better understanding of this.

stevensonmt · March 3, 2024, 3:35pm

Theoretical underpinnings aside, is there any practical difference between an algebraic sum type and a set theoretic union type? Taking the example

$ type result() =
%{output: :ok, socket: socket()} or
%{output: :error, message: :timeout or {:delay, integer()}}

and translating to something using ADTs like Rust (please excuse likely errors in this example as I haven’t written Rust in a long time and I’m a bit, errr, rusty):

struct Socket {
    socket: String
}

enum Message {
    Timeout,
    Delay(i32)
}

enum MyResult {
    Ok(Socket),
    Err(Message)
}

fn handle(result: MyResult) -> String {
    match result {
        Ok(socket) => socket.socket,
        Err(msg) => helper_fn_to_make_msg_a_string(msg)
    }
}

billylanchantin · March 3, 2024, 10:51pm

Theoretical underpinnings aside, is there any practical difference between an algebraic sum type and a set theoretic union type?

This is a great question and I’m not qualified to answer it fully. I think once us Elixir developers get their hands on the upcoming typing tools, the community will be clamoring articles with titles like these:

Teach me Set theoretic types like I know the basics of Algebraic types
Why did Elixir get Set theoretic types instead of Algebraic types?

But I’ll take a stab. Here is one example of how sum types differ from union types: as I understand it, sum types are “tagged” while unions are “untagged”. Take this example:

$type matrix_pills() = :blue or :red
$type light_sabers() = :blue or :green
$type scifi_colors() = matrix_pills() or light_sabers()

Here, :blue is automatically a member of scifi_colors(). There’s no notion of “which” :blue you mean since the union is untagged (it doesn’t matter where it came from).

Contrast that with enum from Rust. If you start with two enums:

enum MatrixPills { Blue, Red }
enum LightSabers { Blue, Green }

There’s no way to combine them such that you get a flat collection of the inner types. You’re stuck with doing things like:

enum ScifiColors1 { MatrixPills, LightSabers }
enum ScifiColors2 { Blue, Red, Green }

With ScifiColors1, Blue is not a member since you need to specify MatrixPills(Blue) or LightSabers(Blue) (i.e. you have to “tag” Blue). With ScifiColors2, Blue is a member but the definition makes no reference to the original enums. So you didn’t really combine them like we were able to with union types.

(This doesn’t mean tagged unions are bad! They’re just different.)

I highly recommend José’s talk from ElixirConf if you want to learn more:

ElixirConf 2023 - José Valim - The foundations of the Elixir type system

antoine-duchenet · March 3, 2024, 10:54pm

From my point of view, since the 3 members of your union are disjoint / clearly discriminated, I don’t see any practical difference with a sum type (in fact they look to me like tagged union members).

It would however be trickier with shape-compatible structs, should the struct name behave as a discriminant if the inner data has the same shape ?

stevensonmt · March 4, 2024, 1:41am

What a great explanation of the differences. Thanks for taking the time!

@antoine-duchenet – since Elixir structs are maps with a private __struct__ field I would assume the struct name could be matched on or not just as any other field.

$ type result() =
%{output: :ok, socket: socket()} or
%{output: :error, message: :timeout or {:delay, integer()}}

could be satisfied by any struct that matches on the specified fields.

iex(1)> defmodule Result do 
    defstruct output: :ok, socket: %{}
end

(2)> s = %Result{}
%Result{output: :ok, socket: %{}}

(3) m = %{output: :ok, socket: %{just_a: :map}}
%{output: :ok, socket: %{just_a: :map}}

(4) fun = fn %{output: :ok} = a -> a.socket
    a -> a
end

(5) fun.(s)
%{}

(6) fun.(m)
%{just_a: :map}

(7) fun.(%{output: :error, message: :timeout})
%{output: :error, message: :timeout}

So I would expect that the __struct__ field would get ignored unless the definition of the type specifically matches on that field.

antoine-duchenet · March 4, 2024, 9:11am

Yes indeed !

My question was more about the behavior desired by the community than the expected behavior with the “current” implementation, expecially concerning exhaustiveness checking.

With something like that :

$ type result() = %Dog{name: string()} or %Cat{name: string()}

$ result() -> string()
def handle(%{name: name}), do: "Hello  #{name}"

Which could be considered as syntaxic sugar to write this :

$ type result() = %{__struct__: :Dog, name: string()} or %{__struct__: :Cat, name: string()}

$ result() -> string()
def handle(%{name: name}), do: "Hello  #{name}"

If the output type behaves like an union type (the way you assume it would), simplified to something like type result() = %{__struct__: :Dog or :Cat, name: string()} , I’d expect the handle/1 function not to raise a “not exhautive” warning because its matching (seen as a set) “contains” every possible value of the result type.
To raise a warning, we would have to define def handle(%Dog{name: name}), do: "Hello #{name}" because this definition does not handle the %Cat{} case.
In an hypotetical world where the __struct__ field receives some special treatment to behave like the discriminant of the sum (a.k.a. the tag of the tagged union), we could imagine the handle/1 function to raise a “not exhautive” warning because its matching (seen as a set) is fully disjoint with the result type (it could be seen as if it meant def handle(%{__struct__: nil, name: name})). This behavior would be closer to the enum wrappers often used in Rust like Ok/Err.
In such a world, we would have to define def handle(%Dog{name: name}) and def handle(%Cat{name: name}).

I’m personally fine with 1. since it looks more Elixirish and flexible to me but 2. may have some benefits.

stevensonmt · March 4, 2024, 2:14pm

The way you have defined the result type does match on __struct__ though, so I would expect in that situation for option #2 to be enforced because while all Cats and all Dogs contain names not every map that contains a name will be a Dog or a Cat and therefore not sufficient to satisfy the type result.

billylanchantin · March 4, 2024, 3:19pm

I believe handle is exhaustive in that case. Both Cat and Dog have the :name key, so there’s no case which isn’t covered.

However, I think this is an example of weak arrows vs. strong arrows. This implementation of handle is weakly arrowed because accepts arguments from the negation of its domain – i.e. it accepts something that isn’t of type result. To make it strongly arrowed, you’d need to restrict the input.

You can read more about strong arrows here:

Strong arrows: a new approach to gradual typing - The Elixir programming language

P.S. Does anyone have that link to the online playground where you can experiment with the new type syntax? I was gonna try the example there but I couldn’t find the link.

antoine-duchenet · March 6, 2024, 11:15pm

IMHO what you describe is only one of the ways to think about it, which indeed seems to be closer to the #2 : considering __struct__ as the tag and that a Map does not match any tag but only the “empty” tag (which makes the %{...} matching disjoint/incompatible with the %Dog{...} type, regardless of the inner matched fields).

Instead, like @billylanchantin said, we can ask ourselves if every possible combination of the result type will match any of the handle/1 clauses (which seems to me like a pretty intuitive meaning of “exhaustive”) based only on the matched fields such as name, regardless of the __struct__ since we are reasoning in an “untagged” way !

The point of my message was precisely to highlight the fact that there is multiple ways (not limited to those 2) to think about it, nothing more !

PS: the tagged (sum) vs. untagged (union) subject is not so far from the “nominative type system” vs. “structural type system” one in some ways.

stevensonmt · March 7, 2024, 2:00pm

My assumption is that the type signature is proscriptive and must be satisfied at compile time. If the pattern in the function head would allow anything other than the specified type in the type signature then the compiler should raise a warning. What you’re suggesting would allow for runtime violations of the type signature because any map with a :name field would be valid at runtime. Maybe this is the difference between a “static” and “gradual” type system. I’m certainly not well-versed in the theoretical underpinnings to understand the nuances there.

LostKobrakai · March 7, 2024, 2:06pm

A gradual typesystem cannot prevent runtime errors. It can only make them less likely by checking the inputs of known shape against the expected type, while leaving the unknown shapes unchecked.

stevensonmt · March 7, 2024, 2:11pm

Thanks for clarifying my misunderstanding. In that case I see the point @antoine-duchenet was making more clearly. That does make me question the value of the type system in the long run, though.

LostKobrakai · March 7, 2024, 2:19pm

It’s gradual. The more information you provide to the typesystem the better it is at doing its job. There are certainly places where the typesystem can infer if a plain map has a key of :name statically – e.g. you might have function(%{name: "abc"}) as literal code in the codebase. But it certainly cannot do so for every case. For example user input will come into a system in some place and that’s by definition not statically validatable, there’s also thinks like dynamic dispatch, ….

dimitarvp · March 7, 2024, 2:57pm

Sure but (a) that’s why we have Ecto.Changeset and various other validation facilities that do runtime enforcement and (b) nobody expects dynamic dispatch to be statically checked / enforced.

The value I would derive from a gradual type system would be compiler warnings / errors like these: “The function is spec’d to return {:ok, binary()} | {:error, term()} but there is a clause inside that also returns :error”.

That would be quite enough for many people, myself included – i.e. pattern-matching exhaustiveness checks where we need them (f.ex. when parsing external data).