I’m new to Elixir…having looked into languages like Haskell, Elm, and Gleam, I’m wondering if Elixir’s new type system will permit sum types to be declared and used with exhaustiveness checking? (Presumably, new syntax would be required for this.)
Oh, I would very much like to know as well.
Why? I’d expect they’d be written in the BEAM “tuple with a leading atom” idiom. A complex example, from gen_statem
:
action() =
postpone |
{postpone, Postpone :: postpone()} |
{next_event,
EventType :: event_type(),
EventContent :: event_content()} |
{change_callback_module, NewModule :: module()} |
{push_callback_module, NewModule :: module()} |
pop_callback_module |
enter_action()
Exhaustiveness checking is mostly a difference of interpretation; if Dialyzer sees code like this:
case some_function_that_returns_action() do
{:postpone, p_value} -> ...
:pop_callback_module -> ...
end
it will assume that some_function_that_returns_action
can’t return the other variants of action
and try to derive a contradiction. It will also complain about dead code if subsequent analysis proves some_function_that_returns_action
never returns :pop_callback_module
.
I’m wondering if Elixir’s new type system will permit sum types to be declared…
I don’t believe so. A sum type is a concept from algebraic type systems. Elixir’s type system will use set theoretic types. Briefly,
Algebraic types have:
- Sum types
- Product types
Set theoretic types have:
- Intersection types
- Union types
- Negation types
In both systems, you build more complicated types from the respective primitives. But although there are similarities, the way you compose the types are different.
However,
… and used with exhaustiveness checking?
From the technical paper:
Exhaustivity Checking Type analysis makes it possible to check whether clauses of
a function definition, or patterns in a case expression, are exhaustive, that is, if they
match every possible input value. For instance, consider the following code:$ type result() = %{output: :ok, socket: socket()} or %{output: :error, message: :timeout or {:delay, integer()}} $ result() -> string() def handle(r) when r.output == :ok, do: "Msg received" def handle(r) when r.message == :timeout, do: "Timeout"
We define the type
result()
as the union of two record types: the first maps the atom:output
to the (atom) singleton type:ok
and the atom:socket
to the typesocket()
; the second maps:output
to:error
and maps:message
to a union type formed by an atom and a tuple. Next consider the definition ofhandle
: values of type%{output: error, message: {:delay, integer()}}
are going to escape every pattern used byhandle
, triggering a type warning:Type warning: | def handle(r) do ^^^^^^^^^ this function definition is not exhaustive. there is no implementation for values of type: %{output: :error, message: {:delay, integer()}}
Note that the type checker is able to compute the exact type whose implementation is missing, which enables fast refactoring since, as the type of
result()
or the implementation ofhandle
are modified, the type checker will issue precise new warnings to point out the places where code changes are required.
So although there are no sum types, I believe there will be exhaustiveness checking.
Disclaimer: I have no inside knowledge! I’m just following along as best I can.
@dimitarvp @billylanchantin
Thanks for helping me to get a better understanding of this.
Theoretical underpinnings aside, is there any practical difference between an algebraic sum type and a set theoretic union type? Taking the example
$ type result() =
%{output: :ok, socket: socket()} or
%{output: :error, message: :timeout or {:delay, integer()}}
and translating to something using ADTs like Rust (please excuse likely errors in this example as I haven’t written Rust in a long time and I’m a bit, errr, rusty):
struct Socket {
socket: String
}
enum Message {
Timeout,
Delay(i32)
}
enum MyResult {
Ok(Socket),
Err(Message)
}
fn handle(result: MyResult) -> String {
match result {
Ok(socket) => socket.socket,
Err(msg) => helper_fn_to_make_msg_a_string(msg)
}
}
Theoretical underpinnings aside, is there any practical difference between an algebraic sum type and a set theoretic union type?
This is a great question and I’m not qualified to answer it fully. I think once us Elixir developers get their hands on the upcoming typing tools, the community will be clamoring articles with titles like these:
- Teach me Set theoretic types like I know the basics of Algebraic types
- Why did Elixir get Set theoretic types instead of Algebraic types?
But I’ll take a stab. Here is one example of how sum types differ from union types: as I understand it, sum types are “tagged” while unions are “untagged”. Take this example:
$type matrix_pills() = :blue or :red
$type light_sabers() = :blue or :green
$type scifi_colors() = matrix_pills() or light_sabers()
Here, :blue
is automatically a member of scifi_colors()
. There’s no notion of “which” :blue
you mean since the union is untagged (it doesn’t matter where it came from).
Contrast that with enum
from Rust. If you start with two enums
:
enum MatrixPills { Blue, Red }
enum LightSabers { Blue, Green }
There’s no way to combine them such that you get a flat collection of the inner types. You’re stuck with doing things like:
enum ScifiColors1 { MatrixPills, LightSabers }
enum ScifiColors2 { Blue, Red, Green }
With ScifiColors1
, Blue
is not a member since you need to specify MatrixPills(Blue)
or LightSabers(Blue)
(i.e. you have to “tag” Blue
). With ScifiColors2
, Blue
is a member but the definition makes no reference to the original enum
s. So you didn’t really combine them like we were able to with union types.
(This doesn’t mean tagged unions are bad! They’re just different.)
I highly recommend José’s talk from ElixirConf if you want to learn more:
From my point of view, since the 3 members of your union are disjoint / clearly discriminated, I don’t see any practical difference with a sum type (in fact they look to me like tagged union members).
It would however be trickier with shape-compatible structs, should the struct name behave as a discriminant if the inner data has the same shape ?
What a great explanation of the differences. Thanks for taking the time!
@antoine-duchenet – since Elixir structs are maps with a private __struct__
field I would assume the struct name could be matched on or not just as any other field.
$ type result() =
%{output: :ok, socket: socket()} or
%{output: :error, message: :timeout or {:delay, integer()}}
could be satisfied by any struct that matches on the specified fields.
iex(1)> defmodule Result do
defstruct output: :ok, socket: %{}
end
(2)> s = %Result{}
%Result{output: :ok, socket: %{}}
(3) m = %{output: :ok, socket: %{just_a: :map}}
%{output: :ok, socket: %{just_a: :map}}
(4) fun = fn %{output: :ok} = a -> a.socket
a -> a
end
(5) fun.(s)
%{}
(6) fun.(m)
%{just_a: :map}
(7) fun.(%{output: :error, message: :timeout})
%{output: :error, message: :timeout}
So I would expect that the __struct__
field would get ignored unless the definition of the type specifically matches on that field.
Yes indeed !
My question was more about the behavior desired by the community than the expected behavior with the “current” implementation, expecially concerning exhaustiveness checking.
With something like that :
$ type result() = %Dog{name: string()} or %Cat{name: string()}
$ result() -> string()
def handle(%{name: name}), do: "Hello #{name}"
Which could be considered as syntaxic sugar to write this :
$ type result() = %{__struct__: :Dog, name: string()} or %{__struct__: :Cat, name: string()}
$ result() -> string()
def handle(%{name: name}), do: "Hello #{name}"
-
If the
output
type behaves like an union type (the way you assume it would), simplified to something liketype result() = %{__struct__: :Dog or :Cat, name: string()}
, I’d expect thehandle/1
function not to raise a “not exhautive” warning because its matching (seen as a set) “contains” every possible value of theresult
type.
To raise a warning, we would have to definedef handle(%Dog{name: name}), do: "Hello #{name}"
because this definition does not handle the%Cat{}
case. -
In an hypotetical world where the
__struct__
field receives some special treatment to behave like the discriminant of the sum (a.k.a. the tag of the tagged union), we could imagine thehandle/1
function to raise a “not exhautive” warning because its matching (seen as a set) is fully disjoint with theresult
type (it could be seen as if it meantdef handle(%{__struct__: nil, name: name})
). This behavior would be closer to the enum wrappers often used in Rust likeOk
/Err
.
In such a world, we would have to definedef handle(%Dog{name: name})
anddef handle(%Cat{name: name})
.
I’m personally fine with 1. since it looks more Elixirish and flexible to me but 2. may have some benefits.
The way you have defined the result type does match on __struct__
though, so I would expect in that situation for option #2 to be enforced because while all Cats
and all Dogs
contain names
not every map that contains a name
will be a Dog
or a Cat
and therefore not sufficient to satisfy the type result
.
I believe handle
is exhaustive in that case. Both Cat
and Dog
have the :name
key, so there’s no case which isn’t covered.
However, I think this is an example of weak arrows vs. strong arrows. This implementation of handle
is weakly arrowed because accepts arguments from the negation of its domain – i.e. it accepts something that isn’t of type result
. To make it strongly arrowed, you’d need to restrict the input.
You can read more about strong arrows here:
P.S. Does anyone have that link to the online playground where you can experiment with the new type syntax? I was gonna try the example there but I couldn’t find the link.
IMHO what you describe is only one of the ways to think about it, which indeed seems to be closer to the #2 : considering __struct__
as the tag and that a Map
does not match any tag but only the “empty” tag (which makes the %{...}
matching disjoint/incompatible with the %Dog{...}
type, regardless of the inner matched fields).
Instead, like @billylanchantin said, we can ask ourselves if every possible combination of the result
type will match any of the handle/1
clauses (which seems to me like a pretty intuitive meaning of “exhaustive”) based only on the matched fields such as name
, regardless of the __struct__
since we are reasoning in an “untagged” way !
The point of my message was precisely to highlight the fact that there is multiple ways (not limited to those 2) to think about it, nothing more !
PS: the tagged (sum) vs. untagged (union) subject is not so far from the “nominative type system” vs. “structural type system” one in some ways.
My assumption is that the type signature is proscriptive and must be satisfied at compile time. If the pattern in the function head would allow anything other than the specified type in the type signature then the compiler should raise a warning. What you’re suggesting would allow for runtime violations of the type signature because any map with a :name
field would be valid at runtime. Maybe this is the difference between a “static” and “gradual” type system. I’m certainly not well-versed in the theoretical underpinnings to understand the nuances there.
A gradual typesystem cannot prevent runtime errors. It can only make them less likely by checking the inputs of known shape against the expected type, while leaving the unknown shapes unchecked.
Thanks for clarifying my misunderstanding. In that case I see the point @antoine-duchenet was making more clearly. That does make me question the value of the type system in the long run, though.
It’s gradual. The more information you provide to the typesystem the better it is at doing its job. There are certainly places where the typesystem can infer if a plain map has a key of :name
statically – e.g. you might have function(%{name: "abc"})
as literal code in the codebase. But it certainly cannot do so for every case. For example user input will come into a system in some place and that’s by definition not statically validatable, there’s also thinks like dynamic dispatch, ….
Sure but (a) that’s why we have Ecto.Changeset
and various other validation facilities that do runtime enforcement and (b) nobody expects dynamic dispatch to be statically checked / enforced.
The value I would derive from a gradual type system would be compiler warnings / errors like these: “The function is spec’d to return {:ok, binary()} | {:error, term()}
but there is a clause inside that also returns :error
”.
That would be quite enough for many people, myself included – i.e. pattern-matching exhaustiveness checks where we need them (f.ex. when parsing external data).