Full static type inference of set theoretic types

D4no0 · September 15, 2023, 11:14am

The talk that we were all waiting for on the future of elixir type system is out:

The talk is amazing, it covers all the concerns we had about the verbosity, syntax and interaction with dynamic code.

The decision was made to make a gradual type check system, however there is one sentence that still bothers me: full static type inferrence on set theoretic types is very expensive. It would be interesting to get feedback on this from someone who understands better what happens under the hood, because for me this not entirely clear why that would be the case and what are the actual limitations of the current implementation involved in this.

We’ve already seen this being implemented in languages like Elm and Roc (this language is not complete yet, however it claims full static type inference and has something that resembles set theoretic types).

The question is why this wouldn’t be possible to be implemented in elixir?

michallepicki · September 15, 2023, 12:03pm

Roc does not have set theoretic types, it is based around row polymorphism and polymorphic variants. Its types are less expressive and the typechecker rejects valid programs more often. Here’s a short example (I haven’t tried it on recent Roc versions, maybe they improved type narrowing since I tried it last year):

» foo = \a ->
…   when a is
…     A x -> B x
…     y -> y
…
… bar =
…   when foo C is
…     B _ -> "ok"
…     C -> "ok"
… bar

── UNSAFE PATTERN ──────────────────────────────────────────────────────────────

This when does not cover all the possibilities:

10│>        when foo C is
11│>          B _ -> "ok"
12│>          C -> "ok"

Other possibilities include:

    A _
    _

I would have to crash if I saw one of those! Add branches for them!

Roc and Elm are languages designed together with a static type system in mind. Elixir is an existing dynamic language and their type systems are not a good fit to typecheck all existing Elixir programs. Someone might want to create an Elixir-like language with a similar type system, and maybe even using the same syntax you could build an Elm-like or Roc-like typechecker for a subset of Elixir programs, but it would be much more restrictive compared to the Elixir that can be written today.

NduatiK · September 15, 2023, 12:23pm

I’m not sure why it’s expensive, but I think that several passes over the code are required to make type decisions.

If the negate function can accept different types (bool or integer), for each call of negate, the type system needs to consider two possibilities. I imagine this can lead to combinatorial explosion. If your system has several functions like negate interacting with each other, it gets harder and harder.

In static languages like Elm, each function has one definition and you can stop as soon as you detect that there is a type mismatch. In Elixir, you might have to consider multiple alternatives first.

(Not an expert😅)

D4no0 · September 15, 2023, 12:30pm

Ah, this is true, so basically from what I understand the limitation compared to elixir is the following: Roc/Elm functions can infer only one final type, witch in turn makes inferring faster and contained. By introducing set theoretic types, we basically introduce a new dimension to this equation and a lot of complexity to compute.

So basically while this now would be painfully slow, it would be possible in theory, granting that someone does the research work to prove this.

adw632 · September 15, 2023, 12:43pm

I’m no expert on set theoretic types either, but the reason will will be that it is a complex constraint solving problem.

I can imagine with multiple versions of functions it is very similar to satisfying package version constraints in a packaging system like is done with the PubGrub algorithm

Here is a nice article explaining how solving multiple versions is not easy, in fact these complex problems are known as NP-Hard and that means they are generally computationaly infeasible to solve in all cases.

sodapopcan · September 15, 2023, 2:28pm

That talk couldn’t have gone better for my concerns around the type system and I’m actually excited about it now

D4no0 · September 15, 2023, 2:33pm

The only thing left is to hope that it will be possible to implement it and have a good performance at the same time.

sodapopcan · September 15, 2023, 2:34pm

Regarding type inference, I’m not even an intermediate let alone an expert so not sure if it’s related, but I know part of what lets OCaml achieve crazy fast compilation times while also being fully statically typed without the need to specify types is that is doesn’t do operator overloading. To add ints you use + and to add floats you use +.. If you want to add both ints and floats you must explicitly cast one of them. Elixir sort of has this. + obviously works on both ints and floats but at least it doesn’t work on strings and lists. I know you asked explicitly in relation to set theoretic types so maybe that is unrelated.

D4no0 · September 15, 2023, 2:35pm

So you want to say that OCaml doesn’t support polymorphic functions?

sodapopcan · September 15, 2023, 2:36pm

Ya, that is a concern. I wrote more of an answer then figured I didn’t wanna go off like I usually do I deleted most of it I’m excited about it provided it all works out! I’m very happy struct typing is coming first because that is all I ever cared to get. I love the inference stuff. I need to re-watch the strong arrow stuff to make sure I fully grok it but I’m happy with how I understand it.

sodapopcan · September 15, 2023, 2:38pm

I believe it does so ya, you’re right, I have no idea how it deals with those. I was just talking more the primitives. I shoulda probably kept my mouth shut

sodapopcan · September 15, 2023, 2:45pm

Wait, the strong arrow stuff is basically what I was just talking about… oh boy. I think I’m out for the day But ya, that is crazy exciting.

billylanchantin · September 15, 2023, 2:53pm

Great talk! I keep meaning to read that paper…

One thing I was wondering: will strong arrows have a syntax? I (think I) get that they differentiate these two cases:

$ integer() -> integer()
def negate_weak(int), do: -int

$ integer() -> integer()
def negate_strong(int) when is_integer(int), do: -int

negate_weak fails to be “strongly arrowed” because it accepts arguments from the negation of its domain. E.g. negate(2.0) works just fine even though 2.0 isn’t of type integer, but negate_strong(2.0) would fail because of the guard.

But is this property implicit? Like if I annotate a function with a type, is its strong-arrowedness known only to the compiler? Or can I opt in/out of it with syntax, e.g.:

$ integer() -> integer()  # $ for weak arrow
def negate_weak(int), do: -int

$$ integer() -> integer() # $$ for strong arrow
def negate_strong(int) when is_integer(int), do: -int

D4no0 · September 15, 2023, 2:57pm

I think it is implicit and is kind of a distinction between runtime constructs guarantees (like guards) and the new spec.

I think the whole reasoning behind this system is the fact that spec errors will not stop execution of program, in the same way dialyzer works, so if the guards were not to be integrated with the spec system, there would be invariant at runtime witch defeats the whole propose of type-checking.

LostKobrakai · September 15, 2023, 3:17pm

No, not a syntax on the type notation. By my understanding what makes something a strong arrow function is what it does/is expected to do at runtime based on the used guards on the function. So you can opt in/out by adding/removing guards.

zachallaun · September 15, 2023, 10:23pm

It’s obviously too early to know, but my guess is that, when function type annotations are supported, there will be an opt-in way get a warning when you write an explicit type annotation that doesn’t result in a strong arrow.

JEG2 · September 15, 2023, 10:49pm

Strong arrows sound like a great example of making contagion work for you.

adw632 · September 16, 2023, 12:01am

I initially thought the same but there is very good reason to not specify annotions for return types.

The Elixir types paper on page 17 suggests that return type annotations should leave it the compiler function body proofs and guards so that we benefit from gradual typing.

Therefore, as a good programming practice it is better to leave the system to deduce the return types of all functions whenever gradual typing is used, by systematically using the underscore _ for return types, since an explicit return type may hinder the propagation of dynamic().
It’s the guards that make it a strong arrow that the Elixir compiler will use to guarantee the return type is within the domain

If you read page 17 it explains how the type system can do a better job by permitting dynamic to flow but using intersections of types. In Jose’s talk he mentioned his t-shirt colour to help explain the concept of “gray and any colour”.

I am optimistic that Elixir will get the balance right and look forward to it landing in the compiler, and also the arrival of types on structs.

zachallaun · September 16, 2023, 2:19am

I don’t think these things are at odds: the return type can be _ and the type can still be a strong-arrow.

josevalim · September 16, 2023, 8:06am

Generally speaking, the more expressive a type system, the harder or more expensive it is to have full inference. If a language chooses to have performant type inference for all of its valid programs, then it often ends up with a type system that rejects more programs. A classic example in Haskell is (\x -> x x) (\y -> y) which is not valid because inference is undecidable (although iirc you can write that with some extensions and explicit type annotations).

I chose to accept more programs instead of full-blown type inference early on in our journey for three reasons:

We must avoid breaking changes (i.e. rejecting valid programs today)
If you don’t want to write the types, you can already not write the types today (and strong arrows/type inference from patterns and guards do their best to find errors)
Users of languages with type inference often recommend writing the types anyways (at least for public functions, private functions can make use of other techniques such as inlining to find errors)

I believe there will be a paper at POPL 2023 on type inference of set-theoretic types which will go into more details. However, it will be too expensive to make it part of the compiler. The types it infers though are incredibly precise. For example, imagine you have a list of integers and strings, and you want to get only integers:

 Enum.filter(list_of_ints_and_strs, &is_integer/1) #=> [1, 23, 56, ...]

In most programming languages, filter has the type: [a], (a -> boolean()) -> [a], which means both incoming and outgoing list still has the type [integer() or string()], even if you keep only integers. However, with set theoretic types, we could get (and infer) the type:

[a or b], (a -> true) and (b -> false) -> [a]

Which in our example it would instantiate:

[integer() or binary()], (integer() -> true) and (binary() -> false) -> [integer()]

And therefore that the type system knows you removed all strings from the list. Being this precise has a high cost during inference (and we want to be precise!). At best, we could be able to use it as an explicit command like “what is the signature for this function” but not as part of compilation.

One last note: a type system with unions and intersections do not necessarily make it a set-theoretic type system. To be set-theoretic the foundation of the type system must all be based on set semantics. You can implement set operations in other ways.