Stream module functions returning functions

Some Stream module functions (like cycle and iterate) return anonymous functions instead of %Stream{} structs. Why? Shouldn’t they return structs in order for protocol dispatching to work properly on Stream-s? These function streams end up in a defimpl implementations for: Functions, not for: Streams.

1 Like

From the docs:

Note the functions in this module are guaranteed to return enumerables.
Since enumerables can have different shapes (structs, anonymous functions,
and so on), the functions in this module may return any of those shapes
and this may change at any time. For example, a function that today
returns an anonymous function may return a struct in future releases.

2 Likes

Protocol dispatch works on any value for which the protocol is implemented, not just structs. There is a Stream protocol implementation specified for functions in the standard library, so functions work just fine.

2 Likes

Hi! Maybe I wasn’t expressing my concerns properly, so let’s give it another try. Imagine you have a protocol and you want to implement it for streams:

defprotocol SomeProtocol do
  def handle(data)
end

defimpl SomeProtocol, for: Stream do
  def handle(_data) do
    IO.puts "Yay! This is a protocol!"
  end
end

defimpl SomeProtocol, for: Function do
  def handle(_data) do
    IO.puts "Ehh!  This is a function..."
  end
end

And now try to use it:

iex> SomeProtocol.handle Stream.cycle([3, 2, 1])
Ehh!  This is a function...
:ok

I understand why this is happening, but I believe this could be done better. The doc says: cycle is a function that “Creates a stream that cycles through the given enumerable, infinitely.”. It returns something that is technically a function, but actually it is a stream. Functions like cycle should return Stream structs to make this work.

Is it possible to wrap a stream function into a Stream struct, and have the same result when consuming the wrapped stream? If yes, is there any reason (efficiency reasons maybe) why functions in the Stream module don’t always return Stream structs? Or maybe protocols are not intended to deal with streams?

I guess the mismatch here is that Stream api is not meant to return a “Stream” struct. The API returns Enumerables, which just happen to generate values lazily. A arity-2 function is just as valid an enumerable as is a Stream struct (or a plain list or map - though they’re not lazy). I’d even lump Date.Range into that in being a stream of date values with their Enumerable implementation. There’s no one datatype in elixir for “being a stream”, because the definition of being a stream mostly means a lazily evaluated enumerable.

The result of that is that you cannot really implement a protocol for “streams”, as streams are not a specific kind of data, but a subset of possible data implementing the enumerable protocol. You’d likely be best of letting whatever you do handle any enumerables and you’ll get stream support for free.

3 Likes

Yes, got you. But still, I believe this could be done better.

It’s a good advice to look at streams as just enumerables. However, we cannot implement a protocol for enumerables either. Just like we cannot pattern match on Enumerables in general, so the same problem arises.

Not all arity-2 functions are valid streams. For istance, +/2 is definitely not a stream. This makes it even harder to define functions that only accept streams or enums. We cannot pattern match on streams, and we cannot write guards that return true for streams and only for streams, and we cannot implement protocols for streams, but I think this can be fixed.

As far as I can see, streams have only two major forms:

  1. certain arity-2 functions (we know it is a stream when we create them, but later on it is impossible to distinguish between stream funs and normal (not streaming) arity-2 functions)
  2. and Stream structs (Let’s leave Ranges out for now.)

Yes, there are other enumerables out there, but streams have only these two forms, am I right?. So, if we could merge the arity-2 function version of streams into the Stream struct approach (which does not seem to be impossible to me, since the stream version also contains functions in it), then some of these anomalies could be eliminated. We could pattern match on streams (def accept_streams_only(stream = %Stream{}) would be fine), and we could even implement protocols for: Streams.

Is there any reason why Elixir keeps arity-2 functions as streams, and does not go further towards Stream structs?

Yes there are other enumerables out there and they may be streams. If you don’t consider Date.Range a good enough example how about File.Stream, IO.Stream, Range? There may also be third party libraries implementing enumerable.

What I don’t understand though is why you need another protocol. There’s already Enumerable. Whatever you do could just work with any enumerable it’s been fed - just like the Stream API does. If it returns another (lazy) enumerable you’re golden. So I’d really be curious which usecase you have, where you need to know if your input is a lazy enumerable over just any enumerable.

1 Like

Every Enumerable can be consumed lazily, ergo every enumerable is a stream. Lists are streams, ranges, regular maps, arbitrary end user structs, that also implement Enumerable, etc. In just my regular elixir app all of the following types are Enumerable (and therefore consumable lazily as streams):

Ecto.Adapters.SQL.Stream, Postgrex.Stream, Floki.HTMLTree, DBConnection.PrepareStream, DBConnection.Stream, Timex.Interval, ExCsv.Table, Jason.OrderedObject, Date.Range, File.Stream, Function, GenEvent.Stream, HashDict, HashSet, IO.Stream, List, Map, MapSet, Range, Stream

Is the pattern matching here the real goal? This whole discussion seems like a bit of an XY problem. Is the problem you are trying to solve basically having some sort of logic gate on whether an input is a stream?

The reason I ask is that the goal you have stated, as best I can tell, entails the rejection of protocols as such entirely. Instead of allowing arbitrary type dispatch into behavior, you’re proposing that a given kind of behavior require a specific type. This is much, much broader than just removing the impl for functions. It’s really a rejection of protocols in general as best I can tell.

1 Like

To be honest, there’s no real use case here.

I’m prepping for a presentation about Elixir for Erlangers and I’m trying to prove that protocols in Elixir are useful and versatile - which they are not actually. While I was trying to come up with examples where to use it, I built a list of types (module names like Range, Regex, Date, Time, DateTime, etc.) that protocols can be implemented for in addition to the basic types (like Atom, BitString, List, and friend). So came the Stream module/struct. Then I realized that it doesn’t actually work, and it is misleading to write defimpl Protocol, for: Stream, because it is not going to cover all streams. I thought it’s only the arity-2 fun that is sticking out, but now I can see that there are plenty other structs that are streams and are not covered either. So streams (and anums) are a bad match to protocols. There’s no practical benefits here.

I’m not sure that’s a great assessment. This exactly shows how great protocols are given Enumerable is a protocol. It allows all those different datatypes to implement this common protocol and work with all the Stream as well as the Enum api.

Disclaimer: What I meant by that is that it’s very rare when you need a new protocol. The existing ones are great, I agree. But Enumerable, Collectabe, String.Chars, Inspect and IEx.Info maxes out the possibilities. You almost never create a new protocol that really makes sense. What you usually do is you implement an already existing protocol (Enumerable most often) for your structs.

The reason why new protocols are not useful enough is that dispatching on big, raw types is usually not enough. (If it could dispatch on spec level types, like {:ok, term()} | {:error, String.t()}, that would be a very useful thing and a game changer. But plain Tuple is just not fine-grained enough for most of the problems.)

You might not need protocols for the usecases covered by existing protocols, though this doesn’t mean protocols are not useful beyond that. See Jason.Decoder / Jason.Encoder for a third party library. Internal ones might be more specific to some domain logic.

You’re aware that individual structs are their own types? In elixir most custom datatypes are structs.

1 Like