Sending Functions Instead of Data

GenericJam · May 16, 2020, 9:39am

In several talks, Joe Armstrong talks about sending functions and doing the processing on the other side of the connection instead of sending the data back and forth. Presumably this saves bandwidth, etc.

Is anyone doing this in practice? In what domain and what problems does it solve? Should everyone be doing it this way?

Qqwy · May 16, 2020, 10:18am

It is a very clear power vs. clarity trade-off. See for instance the paper “Out of the Tar Pit” for more information on why too much power in a programming environment can be considered a bad thing. (I very much recommend this paper!)

As quick summary (the paper describes it much better), consider:

What if there was a mistake in the function? How do you ‘fix’ it later? That’s a lot harder to do with a compiled function than with ‘data representing code’.
How easy is it to reason about arbitrary compiled functions that are being sent from one computer-system to another, vs. computer-systems giving each-other explicitly defined requests/responses?
How would you test such a system?
What about security?

So: While it is a cool concept in theory, in practice you should try very hard to avoid it, because that will make your application more resilient. (With ‘resilience’ being an informal notion of ‘easier to understand’ + ‘easier to test’ + ‘easier to adapt to changing requirements’)

dimitarvp · May 16, 2020, 10:49am

Saving bandwidth these days doesn’t seem to be very important. I mean, unless this approach compresses an 1MB payload to 1KB then you shouldn’t prioritise any technique that can shave off a few hundred bytes in total.

Adzz · May 16, 2020, 12:32pm

Galaxy brain: what if your functions are data.

Let’s take filtering a list of integers as an example. We could define Filter as a protocol like so:

defprotocol Filter do
  defstruct [:collection, :predicate]
  def apply(collection, predicate)
end

Now let’s encode our predicate as data:

defprotocol IsOdd do
  defstruct filter: &__MODULE__.filter/1
  def filter(a)
end

defimpl IsOdd, for: Integer do
  def filter(a) do
    require Integer
    Integer.is_odd(a)
  end
end

Now Lets implement the zip for a list

defimpl Filter, for: List do
  def apply(list, predicate) do
    Enum.filter(list, fn x -> predicate.filter.(x) end)
  end
end

All of that lets us do this:

Filter.apply([1,2,3], %IsOdd{})

Which gets us close. We now just need to capture all of that in its own struct. We’ll define a general function application protocol:

defprotocol Function do
  def apply(function)
end

Then implement if for Filter:

defimpl Function, for: Filter do
  def apply(filter) do
    Filter.apply(filter.collection, filter.predicate)
  end
end

Now we can create our Filter function as a data structure, and as long as where ever we are sending it has the right protocol implementations we can consume it:

%Filter{collection: [1,2,3], predicate: %IsOdd{}}
|> Function.apply( )

What even more interesting is because it’s all protocols each dimension of the filtering problem is extensible. Filtering a collection has 3 dimensions to the problem, the collection being filtered, the items in the collection and the predicate that determines whether something stays in the collection.

Lets now make it so that we can filter on Decimals inside lists:

defimpl IsOdd, for: Decimal do
  def filter(a) do
    Decimal.positive?(a)
  end
end

%Filter{collection: [Decimal.new("1"), 2, 3], predicate: %IsOdd{}}
|> Function.apply( )

Okay and now let us filter on maps as well as lists:

defimpl Filter, for: Map do
  def apply(map, predicate) do
    Enum.filter(map, fn {k, v} -> predicate.filter.(v) end)
  end
end

Function.apply(%Filter{collection: %{a: 1, b: 2, c: 3}, predicate: %IsOdd{}})

Disclaimer, I just find this interesting I have no idea whether it’s a good idea to actually use.

al2o3cr · May 16, 2020, 12:38pm

If you’re using Agent, you’re doing it already - for instance, Agent.update/3 passes an anonymous function to the agent’s process, which then calls the function with the agent’s state.

The bandwidth being saved here is memory bandwidth; bringing the function to the data avoids the overhead of copying the data to a different process.

Qqwy · May 16, 2020, 1:56pm

I mean, you might have heard the saying objects are a poor man’s closures… closures are a poor man’s objects somewhere: We can emulate function calls (possibly with bound variables, AKA closures) using only data, and emulate a full-fledged object system using only closures.

Also, in an essence, when you are defining an API (say, a REST web-API), you are essentially creating a data-representation that someone can use to call your functions.

In (bytecode-)interpreted languages like Elixir, the similarities go even further, because there a compiled function really is a binary of instructions that you can read (and, if you want to live dangerously, modify directly). The same is of course true of machine-code on any system following the Von Neumann architecture, because there data and instructions are stored in the same place. There are very little practical reasons for writing self-modifying code, except to ‘be cool’ or write e.g. computer viruses that try to camouflage themselves.

It is exactly because of that reason, by the way, that e.g. WebAssembly is not following the Von Neumann system and keeps instruction-memory and data-memory separated.

as a side note: interesting tangent about protocols! If you want to read about that kind of stuff some more, you might find some fun tidbits, abeit about Haskell rather than Elixir, here.

rvirding · May 16, 2020, 3:16pm

One thing you should be aware of is that it is very very sensitive and risky to send functions from one to node to another node. Sending a function does not send the actual function code as the function only contains the module name, a checksum of the module, a reference to the function code and the closure, . The checksum is sent to ensure that it is exactly the same module which is used. If it isn’t then you get an error. So you must have exactly the same module on both nodes, even adding comment lines can ruin it.

This is because functional objects came later when the code handling had already been defined and implemented.

Adzz · May 16, 2020, 3:23pm

I had not heard that thanks I’ll have a read.

Yea I find the protocol thing intersting I’ve been playing with the concept on a branch of my zip library here: https://github.com/Adzz/Zip/tree/go-crazyy-ah-ah-go-stupid-oh-oh

and wrote about it here: https://medium.com/@ItizAdz/zip-elixir-abusing-protocols-for-triple-dispatch-and-ultimate-flexibility-4c817a5940d6

It felt like I was heading towards creating a poor type system in some way. and I’m sure it links to defunctionalization somewhow https://blog.sigplan.org/2019/12/30/defunctionalization-everybody-does-it-nobody-talks-about-it/