Enumerable for function(2)

arikai · March 3, 2020, 11:30am

In the recent EEF SWG it was pointed out that function(2) has support for Enumerable protocol.

While it’s obvious that “the vulnerability” can be exploited the same way with the Streams, what is still curious to me is the following.

Why is there implementation for a such type in the first place?
During some digging in the commits via git blame it was found that defimpl was added to support then-called Enum.Iterator (now it’s just Enumerable) and was basically what Streams are for nowadays: lazy sequences.

But now we have Streams, yet this piece of code still exists. Shouldn’t it be removed?

I believe responses from the Core team might be useful here.

hauleth · March 3, 2020, 11:37am

AFAIK Streams are just 2-ary functions, that is why Enumerable is implemented for them.

arikai · March 3, 2020, 12:37pm

Nope

iex(1)> 1..10 |> Stream.map(&(1+&1)) |> Map.get(:__struct__)
Stream

This is the whole point: when structs are present in the language, such thing is unnecessary.

hauleth · March 3, 2020, 12:53pm

Yes:

iex(1)> Stream.unfold(5, fn 
...(1)> 0 -> nil
...(1)> n -> {n, n - 1}
...(1)> end) |> is_function(2)
true

arikai · March 3, 2020, 1:27pm

Well that’s because this function in fact returns function(2) but not Stream:

github.com

elixir-lang/elixir/blob/v1.10.2/lib/elixir/lib/stream.ex#L1524-L1527


@spec unfold(acc, (acc -> {element, acc} | nil)) :: Enumerable.t()
def unfold(next_acc, next_fun) when is_function(next_fun, 1) do
  &do_unfold(next_acc, next_fun, &1, &2)
end

As a proof:

iex(4)> Stream.map(1..5, &(&1+1)) |> is_function(2)
false

Although it answers the question why defimpl for function(2) can’t be simply removed from the enum.ex module, the reason why this solution is still used is still a mystery to me.

benwilson512 · March 3, 2020, 1:30pm

This is considered a vulnerability?

al2o3cr · March 3, 2020, 1:32pm

IMO “the vulnerability” here is deserializing untrusted data from ETF - the Enumerable protocol running 2-arity functions is only a way to escalate the impact of that vulnerability from a simple DoS (like you’d get with a giant / unbounded Range) into RCE.

hauleth · March 3, 2020, 2:21pm

And even if there would be no defimpl for 2-ary functions, the problem would still be there, as nothing prevents user from encoding %Stream{enum: malicious_fun} to ETF.

arikai · March 3, 2020, 3:21pm

Not at all: notice those quotes

@al2o3cr @hauleth It’s not about the possibility of running external (possibly malicious) code on you node without knowing it. I do acknowledge that there’re ways to do so as long as there’s a way to convert lambda to ETF.

What I’m asking is that there’s code that’s no longer needed with Streams present in stdlib.

I originally wanted to create an issue with this, but maybe I am missing something here.

benwilson512 · March 3, 2020, 3:24pm

I think it’s a mistake to think of a Stream as a single kind of datastructure. What exists is an Enumerable protocol, and then Enum functions to consume those Enumerables eagerly, and Stream functions to consume those Enumerables lazily. Absolutely anything that implements the Enumerable protocol is Enumerable. The standard library, mostly for I think easier pattern matching and IO.inspect purposes, has created some Stream named structs to represent certain constructs that have been built in a way that is amenable to streaming. That doesn’t mean that that fits every scenario however.

It’s also worth keeping in mind Elixir compatibility guarantees. At best you could deprecate the use of 2 arity functions as enumerables, but you can’t eliminate it without breaking people’s code that relies on this.

arikai · March 6, 2020, 2:03pm

Now that’s the interesting part.

I’ve failed to find any documentation for this behaviour. That is, to use function(2) as a first argument to function calls from Enum and Stream modules.

And IF it is not documented: what compatibility guarantees are we talking about?

It is just a rule of thumb that you should never rely on any undocumented behaviour or implementation-specific details. And I consider it to be the case here.

benwilson512 · March 6, 2020, 2:12pm

Protocol implementations themselves seem intrinsically public to me, if for no other reason than that they are global. You cannot create a protocol implementation for a given protocol for a given type if one already exists. Every protocol implementation Elixir ships with has direct and public implications with respect to your code base because of this. If you rely on functions being enumerable, you literally have no choice but to use Elixir’s built in impl, it is impossible to write your own.

arikai · March 6, 2020, 2:34pm

Well, sure, you are not wrong.

Let me just recap the whole issue for me here:

There’s an undocumented code in the Elixir stdlib that exists simply because it was the only possible way to implement such “structures”
There’s a better way to construct procedurally generated sequences, that is, Stream (e.g. Stream.resource)

That’s all.

benwilson512 · March 6, 2020, 2:41pm

Happy to debate whether it is a good idea to support functions or not but it isn’t undocumented:

https://hexdocs.pm/elixir/Stream.html#module-creating-streams

Since enumerables can have different shapes (structs, anonymous functions, and so on), the functions in this module may return any of those shapes and that this may change at any time. For example, a function that today returns an anonymous function may return a struct in future releases.

That note has been there since Elixir 1.0

NobbZ · March 6, 2020, 2:43pm

The current Stream struct, only works to transform some Enumerable into something else lazily, it can not represent a source, as well as it can’t represent a sink (well, nothing in the Stream module can create a sink, therefore thats not important).

And regardles how you represent lazily evaluated sequences, you’ll always need a function.

In my opinion it does not make any differences if a function is sent directly over the wire or wrapped in a struct. It will be called eventually…

arikai · March 10, 2020, 9:49am

@benwilson512 Huh, that is a small one. However, this line only documents that it is a bad decision to use function’s return value as anything but value of opaque type that somehow has impl for Enumerable. Nothing else. It doesn’t explicitly says that it’s ok to use function(2) as an Enumerable. Therefore, whoever used it is their own enemy.

@NobbZ Well, it does. IMO, if there exists Stream struct in a language, there shouldn’t be such a “hacky” (IMO) solutions as a function(2) for Enumerables, because it can easily lead to a rather strange errors, where somehow function(2) has been used as a first argument to Enumerable/Stream, and yet, it doesn’t tell that input itself is bad: it will tell that the function returned bad values, which is ambiguous to a programmer.

mat-hek · March 10, 2020, 10:15am

Totally agree. I was digging in the Stream implementation when contributing to stream split and the problem you mention is one of the things that make it difficult to reason about.

josevalim · March 10, 2020, 11:42am

I think there is a confusion on the arguments being made here.

The argument that using function(2) can be the cause of confusion - because I can accidentally pass it around as a enumerable is a valid one
The argument that using function(2) is somewhat more dangerous is incorrect though. The issue that can be triggered with function(2) would also help if we wrapped it in a Stream or any other struct, because Streams are fundamentally about passing continuations of code around and changing it from function(2) to Stream won’t fix it.

So while the first argument 1 is valid and has some merit - it is not a major improvement to justify changing the current feature set. Maybe there could be other benefits in making them more structured - but if there are, I am not aware of them at the moment.

I don’t think it would make the implementation easier to digest as well. As ultimately the difference would be that a function would return Stream.stream(&foo(&1, &2)) instead of &foo(&1, &2). Streams are naturally complex, given the fact they provide zipping, filtering, and protection against dangling resources. Wrapping an anonymous function call in a remote call or not is not going to do much to counter that.

arikai · March 11, 2020, 7:12am

@mat-hek
That’s an interesting work you have there. Keep it up! I would definitely like to see this kind of functionality in the existing Stream!

@josevalim
It may be that I’ve made myself not totally clear. The second point wasn’t the argument all along, just as I’ve pointed out in the second paragraph of the OP.
However, I do agree that it is a rather minor issue to worry about in the current state of things.
It would be good though to put a removal of such an artifact on a roadmap/backlog if you think it might be for a good reason.