Before you can truely understand streams in elixir you need to understand enumerables.
Enumerable
is a protocol for collections of data, which can be iterated over using reduce at its core. An enumerable can be e.g. a list, which already exists completely in memory ready to be iterated over. It can however also be “lazy” in that the actual values the collection exists of are only computed or pulled into memory once the enumerable is iterated / reduced over.
A simple case for a lazy enumerable in elixir is Range
. It models a sequence of interger values, but the struct only stores the lower and upper boundary of the sequence. 2..15
only stores a struct containing 2
and 15
. Only once you start iterating over the range the values between those two numbers are computed – so 3, 4, 5, …
.
So that’s Enumerable
. Now there’s also Enum
and Stream
.
Enum
provides APIs to work with Enumerable
data. It is eager in the sense that each call to an Enum
function will iterate over as much of the enumerable collection as the operation would require. E.g. Enum.map
will iterate over the complete input and return a list of mapped values for each item in the input collection. Enum.take(input, 1)
will take out the first item from the input and be done.
The return value of Enum
apis might be another Enumerable
, which is why you can often see chained calls to its apis, but it’ll never be another lazy enumerable.
Now the difference with Stream
is that its apis will always return lazy enumerables. So the input is wrapped with some lazily computable operation, which gives you a lazy enumerable. Some functions don’t even require an enumerable as input, but build an lazy enumerable from other input, others wrap existing enumerables – them being lazy or not.
Therefore the answers to your questions:
Yes, each Enum.map
will execute once, but not just for a single element of your initial input. They’ll all map over all lines of your file and only the last step will discard all the work, besides for the first item in there. Previous steps have no idea you’ll discard stuff in the end.
See above. The protocol is Enumerable
and the way it works does allow for both lazy and “not so lazy” enumerables.
There’s no “stream” datatype and also things on the beam are immutable. So there’s no way to change something without changing it. But I hope I explained enough before that streams don’t just stop being streams. If you want to retain an lazy enumerables lazyness you add additional operations using Stream
apis. Once you start using Enum
operations become eager.
Lazy really only refers to the fact that items of an enumerable collection somehow become available “on demand”. This could be being loaded on demand or computed, …