Full static type inference of set theoretic types

hst337 · September 17, 2023, 6:41pm

I’ve expected this response. You can translate this example to Enum.random(for x <- 1..n, y <- 1..n, when x != y, do: {Tuple.duplicate(nil, x), Tuple.duplicate(nil, y)}). And the false positive to 1 / (tuple_size(x) - tuple_size(y))

josevalim · September 17, 2023, 6:52pm

I don’t see how that changes anything. If we only have integer(), we won’t get any compile-time warning on 1/integer(). There is no false positive here.

Tuple.duplicate/2 with a dynamic value can also only emit tuple(), we cannot assert on the shape. If you have specific shapes, then you need specific tuple instantiations. In the same way someone should not expect String.to_atom(expr) to return a particular atom instance at compile-time.

Edit: in any case, I am not saying we won’t have a false positive ever. What I am saying we will make decisions in order to guarantee the type system is performant and minimize false positives on common Elixir idioms.

hst337 · September 17, 2023, 9:32pm

They apply, but only partially. As the paper states, they use BDD of DNF (which is essentially a lookup tree for disjoint intersections) with lazy unions, and this structure has problems where non-union operations are applied to the set during the inference and subtype checking after a series of union operations.

Rich_Morin · September 19, 2023, 4:40am

I would like to see some discussion of (current and/or impending) “idiomatic Elixir code” practices in the context of this work. For example, I’d think that writing lots of short functions (e.g., as recommended by @pragdave) would make things easier for the type inference code.

-r

josevalim · September 19, 2023, 6:14am

It is literally writing patterns and guards. It can be in function heads, case, etc. The size of the function does not matter.

Rich_Morin · September 19, 2023, 6:26am

Erm, imagine two implementations of some functionality. One uses a long, complex function chock full of flow control constructs (e.g., case, cond, if). The other has been decomposed into a set of small, tightly-focused functions which rely heavily on patterns and guards. Even if the compiler (etc) has no problem with the first approach, a coder might…

-r

P.S. Once again; great work!

josevalim · September 19, 2023, 6:28am

To clarify, you asked in context of the work, so my reply focused only on that. In context of the type inference in my earlier quote, the size does not matter.

DEvil0000 · September 22, 2023, 1:05pm

How about a Number type so int and float can get expressed in one simple type.
This may be useful for something like your negate/1 example.
$ Number → Number

How about a template like T type so one could write “same in as out”
$ T → T

sabiwara · September 22, 2023, 3:31pm

A possible syntax for this is covered in the recently published blog post:

$ a and number() -> a and number()
def identity(arg), do: arg
Or course, we can provide syntax sugar for those constraints:
$ a -> a when a: number()
def identity(arg), do: arg

adw632 · September 26, 2023, 7:26am

That’s a really nice explanation in the blog post.

I understand that no annotations are planned for the initial implementation of gradual types, however this did jump out as not ideal if annotations were supported:

$ a -> a when a: number()
def identity(arg) when is_number(arg), do: arg

Using both the guard and the annotation seems quite redundant, one of the reasons I don’t enjoy writing type specs and repeating myself.

Has there been any thought on how annotations and guards can replace one OR the other? (i.e. NOT both)

If for arguments sake we have the following annotation with no guard:

$ a -> a when a: number()
def identity(arg) do: arg

Could we expect the compiler to add the guard when is_number(arg) automatically?

Or turning this around taking annotations off the table completely where the complier respects guards and only the guards matter, would we still need another way to express return types?? Such as:

def identity(arg) returning number() when is_number(arg), do: arg

Or with different combinations of types:

def identity(arg)
      returning integer() when is_number(arg) or
      returning float() when is_float(arg) do: arg

def identity(arg)
      returning number() when is_number(arg) or when is_float(arg) do: arg

def somefunc(arg)
      returning ModuleA.t() when is_number(arg) or
      returning ModuleB.t() when is_float(arg) do: ....

Ideally multiple return clauses might be a signal to the developer to actually define specialized functions.

Just saying I would rather the compiler support directly expressing the return type as part of a function guard if the compiler cannot infer the return type, than requiring repeat yourself annotations. Ideally I hope this is not required in the fullness of time and the compiler doesn’t need annotations or returning clauses at all not discounting the need for types on structs.

LostKobrakai · September 26, 2023, 7:45am

I think a good chunk of the constraints applied here is that the typesystem is meant to be an optional piece on top of how people write elixir today – at best providing benefits to users without them needing to change anything/much to the code.

Elixir as it’s written today extensively uses multiple function heads to branch off functions based on input parameters. At least to me inline type annotation never made much sense in the context of multiple function heads. As an example there are commonly functions where you have converging codepaths of many possible inputs to a single output (a | b | c -> x). You’d duplicate the result type if you put it on the function head.

I prefer the duplication of external type annotations for inputs – which are commonly in one place and therefore easy to find/read – to duplication of return values, which are scattered over potentially many lines of code with the function heads of the actual implementation of the function.

josevalim · September 26, 2023, 7:53am

This is not an immediate goal but we may add some guards based on types in the future.

This doesn’t work because types are more expressive than guards. So associating the two will generate impedance because you will either have weak types (limited to guards semantics) or you will have fake guards (which says something is a list of structs but it is incapable of checking if they are indeed the same).

Also, I think the duplication is obvious (and annoying) only on extremely short functions as above. In the majority of functions, I doubt they would feel redundant. It is too early and we have too few examples to start optimizing for “duplication”.

Furthermore, Elixir already allows pattern matching, guards, and default arguments in function signatures. Adding more complexity to it is the wrong way to go imo.

adw632 · September 26, 2023, 8:17am

Thanks Jose.

So does this mean that the compiler may at some point consider the pattern matching clauses as the domain of the type? i.e. not just the base language types but constrained to the value domain of the pattern matching clauses as well?

wanton7 · September 26, 2023, 12:13pm

That sounds bit like Typed Strings in Bosque programming language where strings are typed with Regex.

josevalim · September 26, 2023, 1:00pm

This is something the type system supports already but it is yet to be decided if we will add it to Elixir. We can allow 0, 1, 2, etc as types, but doing so means we often have to “prove” more to the type system and that may also make it slower. It is something we can explore further down the road.

markevans · September 27, 2023, 9:54am

I was wondering about this too - would you explore e.g. with guards such as

when is_integer(x) and x > 10

the compiler inferring the set of “all integers above 10” (rather than just integer())?
I imagine it could be hard to make performant but would be pretty amazing if it worked.

Also (on a slightly unrelated note) out of interest why do all the types have parentheses?
I keep thinking

$  (integer -> integer) and (boolean -> boolean)

would feel a lot cleaner to me than

$  (integer() -> integer()) and (boolean() -> boolean())

and even more so when the signature gets more complex

fmn · January 29, 2024, 2:15pm

Hello Jose, getting back to what you wrote, couple of months ago.

I am curious if situation somehow evolved, clarified; how do you see it today, are you still considering it for inclusion into Elixir?
Just to make sure I got it right: is it about dependent types?

Thanks!
Filip

josevalim · January 29, 2024, 3:25pm

We are only going to know for sure once the type system is in place and already running in user code. In other words, it is a matter of a year or two, and not months.

No, it is about singleton types. I had the same confusion though. Singleton types are types that represent a single value, like 42, while dependent types are types that depend on values, for example, you can have the dependent type LengthList, where for every integer n, LengthList n is the type of lists of size n.