Understanding restrictions of :erlang.binary_to_term

tcoopman · December 10, 2023, 6:37am

I want to better understand the limitations of :erlang.term_to_binary and :erlang.binary_to_term.

More specifically, I want to encode functions so that I can execute the later safely, but I’m not sure what’s safe and what’s not:

fn a -> a end
fn a -> MyModule.do_something(a) end
&MyModule.do_something(&1, "fixed_arg")

MyModule.do_something will be available.

I’m trying to understand which of these options can result in problems, and how I can avoid them if possible.

I’ve run in an error like:

** (BadFunctionError) function #Function<24.134130798/1 in IEx.Autocomplete> is invalid, likely because it points to an old version of the code

and I want to avoid that in the future.

LostKobrakai · December 10, 2023, 8:37am

I think the only way to do this somewhat „safely“ is either &Mod.fun/1, which doesn‘t need to capture any environment or have the code defining the anonymous function not change, e.g. by running the literally same codebase (that‘s afaik how FLAME makes anonymous functions work).

If those encoded values are less than short lived I‘d strongly suggest a different encoding though, which consists of pure data and possibly even add a version field.

dimitarvp · December 10, 2023, 8:46am

Oh don’t even go into that rabbit hole. Make a small module namespace in your app, version it e.g. MyApp.TransferableFunctions.V_1_2.OrderFunctions, refer to it via the normal capture operator or just encode the MFA tuple e.g. %{module: "ABC", function: "def", arity: 3} and just be done with it.

tcoopman · December 10, 2023, 10:09am

Thanks for the response. I should have clarified a little bit.
I want to understand what’s allowed and what isn’t, and why.

If I read @LostKobrakai’s answer, it seems: &Mod.fun/1 is safe, but why isn’t fn a -> &Mod.fun(a, "foo") end or &Mod.fun(&1, "foo")?

dimitarvp · December 10, 2023, 10:14am

Technically nothing much stops you from serializing a function but you have to think about security – is there way for a random code that tries to delete directories to sneak in, for example?

tcoopman · December 10, 2023, 10:19am

The code is fully under my own control, so I’m not going to serialize stuff that comes from the user. So safety is not really a concern.

Serializing works indeed, but run a function that was previously serialized (the function was &Mod.fun(&1, "a") resulted in:

** (BadFunctionError) function #Function<24.134130798/1 in Mod.fun> is invalid, likely because it points to an old version of the code

Reading into this, I saw responses saying that this is because of conflicting OTP versions. I don’t think that was the case for me (it could be, but I’d need to investigate more). I suspect that it was something that the code was run on a different node???

And so, I want to understand what the limitations are, when something is ok and when it isn’t.

dimitarvp · December 10, 2023, 10:22am

I have no experience with that error, but I would still err on the side of keeping references to functions that exist in your own source – and not just encode/decode inline functions willy-nilly. It would IMO help and also give you the option to also version them, as noted above.

bjorng · December 11, 2023, 4:48am

An anonymous function can be successfully called if the BEAM file that created the anonymous function in the first place is loaded. The Erlang runtime system checks that by comparing a checksum in the anonymous function term itself with a checksum of the BEAM file. If they are different, the call will fail.

The checksum for a BEAM file will change if the source code is changed in significant ways, or it may change if a different version of the Erlang or Elixir compiler is used.

Therefore, unless you can guarantee that the same BEAM that created the anonymous function is loaded, as already mentioned, the only “safe” anonymous function is of the form &Mod.fun/1, because that only requires the module to be loaded and the function to be exported.

tcoopman · December 11, 2023, 12:09pm

That will probably the reason for my failure.

Thank you for the reply, now I understand what went wrong.

tcoopman · December 11, 2023, 3:32pm

2 additional questions:

If I use Function.info (Function — Elixir v1.15.7) to check if the type is :external, then I’m good to go I think?
I’m not sure I understand why &Foo.fun/1 is :external and &Foo.fun(&1, "foo") is :local. Looking at the output of Function.info I guess because that returns a new function with arity 1 that captures the the environment. Correct?

LostKobrakai · December 11, 2023, 3:51pm

Two types of funs have slightly different semantics:

A fun created by fun M:F/A is called an external fun. Calling it will always call the function F with arity A in the latest code for module M. Notice that module M does not even need to be loaded when the fun fun M:F/A is created.

All other funs are called local. When a local fun is called, the same version of the code that created the fun is called (even if a newer version of the module has been loaded).

https://www.erlang.org/doc/man/erlang#fun_info-1

:external is likely the second type, because it matches how an external call is made under other circumstances, e.g. a plain function call on an external module from anywhere.

bjorng · December 11, 2023, 5:18pm

Yes.

Yes, &Foo.fun(&1, "foo") is shorthand for fn bar -> Foo.fun(bar, "foo") end, which is a local anonymous function that takes one argument.

al2o3cr · December 11, 2023, 10:52pm

FWIW, a common workaround for this form not being serialization-safe is to instead use a module/function/args tuple instead:

{MyModule, :do_something, ["fixed_arg"]}

Then the recipient can use it with something like:

defp call_mfa({m, f, a}, x), do: apply(m, f, [x | a])

Probably not relevant to your specific use-case, but check out the way Khepri solves the anonymous function problem - it uses some deep BEAM magic to capture the entire implementation and save it to the Raft log!

tcoopman · December 12, 2023, 6:23am

That’s something that I’ve been pondering about as well. What if I inject the anonymous function into a Module that I create at runtime and run that…
But that’s a bit overkill for what I need so I’m not going to go there…