How to define the type of a Stream with a specific type of elements?

For a list of integers, I can simply write a typespec like this:

@type foo() :: [integer()] # or list(integer())

But what about a Stream of integers?

There is no Stream.t() or Stream.t(element_type) and Enum.t() does not allow any type information about the elements. There is Enumerable.t(), which is what Stream.resource/3’s return type is typed as, and it accepts a type argument, so Enumerable.t(integer()) is a valid way of defining an Enumerable containing integers. However, that still equates to term() so it doesn’t give any type guarantees about the elements returned from the Enumerable or Enum functions.

Also, I want it to be clear to users of my functions that the function returns a Stream-like object where it is advisable to use Stream functions instead of Enum functions. Consider the following example:

defmodule MyApp.SomeStruct do
  @type t() :: __MODULE__{foo: integer(), bar: String.t()}
  defstruct foo: 0, bar: ""

  @doc """
  Stream a bunch of SomeStruct objects from a JSON lines file.
  """
  @spec stream(String.t()) :: Enumerable.t(t())
  def stream!(filename) do
    filepath |> File.stream!() |> Stream.map(fn line -> line |> JSON.decode!() |> decode() end)
  end

  @spec decode!(map()) :: t()
  def decode!(data) do
    %MyApp.SomeStruct{foo: data["foo"], bar: data["bar"]} # contrived example
  end
end

Currently, I use Enumerable.t(t()) instead of [t()] to indicate that we’re not returning a list, but more generically we’re returning an Enumerable. But actually, I want to say “hey this is a Stream, I recommend using Stream methods and only using an Enum method when you actually want to start consuming the items from this stream. Oh, and the items are of type t().

Is there currently a way in Elixir to write such a type? Or are there plans to support such a feature in Elixir’s type system?

There cannot be any type guarantees for protocol implementations just like there cannot be type guarantees around behaviours. Dynamic dispatch cannot have correctness enforced at compile time for as long as valid implementations can be added at runtime – by adding additional modules.

Behaviour types all come down to module() and eventually atom() in typespecs, protocols can implemented for any term, so that’s what you ran into. If a provided atom actually references a module implementing an expected behaviour or if a given term has a related protocol implementation cannot be determined at compile time.

That’s imo a misunderstanding of the enumerable protocol. There is no stream just like there’s no “not a stream”. They’re all Enumerables, potentially infinite in length. The difference between Stream and Enum is not in the source data, the difference is in the processing / how they work over enumerables. And the caller decides to either process lazily (e.g. stream apis) or processes eagerly (e.g. enum apis).

1 Like

Is that just currently impossible or totally impossible without massive refactors, you think?

Imo if Elixir aims to expand its set-theoretic type system, it would be very useful to be able to define a function that takes as argument a module that implements a specific behaviour. And then wherever you use that function, the type-checker could check whether the type of whatever is passed in as an argument to that function indeed resolves to a module that has the functions that are required of the behaviour.

Similarly for protocols, although the exact moment of protocol consolidation might be an issue, it would be very useful to be able to write e.g. @spec take(Enumerable.t(item), integer()) :: item when item: var or something similar, with function calls to this method being checked for whether the first argument is indeed of a type for which the Enumerable protocol was implemented.

It’s similar to how in object-oriented languages (but also Go) you can define a function that requires an argument implementing a specific interface. Adherence of an object to that interface either gets checked nominally (e.g. Java checks whether the object’s class is defined with implements YourInterface) or structurally (e.g. TypeScript checks whether the object has the correct type of properties (which could be functions) that the interface prescribes it to have. Go does a similar thing but interfaces can only define functions there. I’m not sure but I think Rust also does a similar thing for its traits system).

True, that makes sense, thanks for that distinction! Either way, they are both enumerables, it’s just a question of whether the caller wants to do lazy or eager processing and that’s completely up to them to choose. I guess I’ll just resort to educating my callers with mild suggestions in the docs that it could be useful to use Stream functions here.

Totally impossible given you can defmodule a new module at any time or replace an existing one – even at runtime. Most famously that’s how hot code updates work. The typesystem could make assumptions of no runtime created modules, but I wouldn’t expect that to happen.

1 Like

IMO this highlights a big difference between “parameterized types” (the typespec system) and other systems like Rust’s generics.

Parameterized types are better thought of as compile-time functions that produce types. They always require fully-defined arguments and don’t support inference-like patterns.

For instance, you can’t write a signature like this:

# THIS DOES NOT WORK
@spec generic_filter(Enumerable.t(x), (x -> as_boolean(term()))) :: Enumerable.t(x)

Making a Stream.t(element_type) would also have the complication that there’s no obvious place to put element_type in the resulting %Stream{}; it’s not necessarily the type of the contained enum since funs are applied, and it’s only related to the top element of funs.

2 Likes