In the recent EEF SWG it was pointed out that function(2) has support for Enumerable protocol.
While it’s obvious that “the vulnerability” can be exploited the same way with the Streams, what is still curious to me is the following.
Why is there implementation for a such type in the first place?
During some digging in the commits via git blame it was found that defimpl was added to support then-called Enum.Iterator (now it’s just Enumerable) and was basically what Streams are for nowadays: lazy sequences.
But now we have Streams, yet this piece of code still exists. Shouldn’t it be removed?
I believe responses from the Core team might be useful here.
IMO “the vulnerability” here is deserializing untrusted data from ETF - the Enumerable protocol running 2-arity functions is only a way to escalate the impact of that vulnerability from a simple DoS (like you’d get with a giant / unbounded Range) into RCE.
@al2o3cr@hauleth It’s not about the possibility of running external (possibly malicious) code on you node without knowing it. I do acknowledge that there’re ways to do so as long as there’s a way to convert lambda to ETF.
What I’m asking is that there’s code that’s no longer needed with Streams present in stdlib.
I originally wanted to create an issue with this, but maybe I am missing something here.
I think it’s a mistake to think of a Stream as a single kind of datastructure. What exists is an Enumerable protocol, and then Enum functions to consume those Enumerables eagerly, and Stream functions to consume those Enumerables lazily. Absolutely anything that implements the Enumerable protocol is Enumerable. The standard library, mostly for I think easier pattern matching and IO.inspect purposes, has created some Stream named structs to represent certain constructs that have been built in a way that is amenable to streaming. That doesn’t mean that that fits every scenario however.
It’s also worth keeping in mind Elixir compatibility guarantees. At best you could deprecate the use of 2 arity functions as enumerables, but you can’t eliminate it without breaking people’s code that relies on this.
Protocol implementations themselves seem intrinsically public to me, if for no other reason than that they are global. You cannot create a protocol implementation for a given protocol for a given type if one already exists. Every protocol implementation Elixir ships with has direct and public implications with respect to your code base because of this. If you rely on functions being enumerable, you literally have no choice but to use Elixir’s built in impl, it is impossible to write your own.
Since enumerables can have different shapes (structs, anonymous functions, and so on), the functions in this module may return any of those shapes and that this may change at any time. For example, a function that today returns an anonymous function may return a struct in future releases.
The current Stream struct, only works to transform some Enumerable into something else lazily, it can not represent a source, as well as it can’t represent a sink (well, nothing in the Stream module can create a sink, therefore thats not important).
And regardles how you represent lazily evaluated sequences, you’ll always need a function.
In my opinion it does not make any differences if a function is sent directly over the wire or wrapped in a struct. It will be called eventually…
@benwilson512 Huh, that is a small one. However, this line only documents that it is a bad decision to use function’s return value as anything but value of opaque type that somehow has impl for Enumerable. Nothing else. It doesn’t explicitly says that it’s ok to use function(2) as an Enumerable. Therefore, whoever used it is their own enemy.
@NobbZ Well, it does. IMO, if there exists Stream struct in a language, there shouldn’t be such a “hacky” (IMO) solutions as a function(2) for Enumerables, because it can easily lead to a rather strange errors, where somehow function(2) has been used as a first argument to Enumerable/Stream, and yet, it doesn’t tell that input itself is bad: it will tell that the function returned bad values, which is ambiguous to a programmer.
I think there is a confusion on the arguments being made here.
The argument that using function(2) can be the cause of confusion - because I can accidentally pass it around as a enumerable is a valid one
The argument that using function(2) is somewhat more dangerous is incorrect though. The issue that can be triggered with function(2) would also help if we wrapped it in a Stream or any other struct, because Streams are fundamentally about passing continuations of code around and changing it from function(2) to Stream won’t fix it.
So while the first argument 1 is valid and has some merit - it is not a major improvement to justify changing the current feature set. Maybe there could be other benefits in making them more structured - but if there are, I am not aware of them at the moment.
I don’t think it would make the implementation easier to digest as well. As ultimately the difference would be that a function would return Stream.stream(&foo(&1, &2)) instead of &foo(&1, &2). Streams are naturally complex, given the fact they provide zipping, filtering, and protection against dangling resources. Wrapping an anonymous function call in a remote call or not is not going to do much to counter that.
That’s an interesting work you have there. Keep it up! I would definitely like to see this kind of functionality in the existing Stream!
It may be that I’ve made myself not totally clear. The second point wasn’t the argument all along, just as I’ve pointed out in the second paragraph of the OP.
However, I do agree that it is a rather minor issue to worry about in the current state of things.
It would be good though to put a removal of such an artifact on a roadmap/backlog if you think it might be for a good reason.