Risks of serializing structs and captured function

fireproofsocks · July 3, 2021, 1:03am

Sorry for the long post, but in the process of building out a job queue using Oban, the desire to have messages include captured functions has come up. Certain variables are known at the time when the job message is created, so it is possible to do something like the following:

job_args = %{
   captured_fn: fn -> MyModule.something(input, opts) end
          |> :erlang.binary_to_term()
          |> Base.encode64()
}

Serialization using :erlang.binary_to_term/2 and Base.encode64 is required so that Elixir functions/structs/atoms/etc may be safely inserted into the database as an Oban job because the args are JSON encoded.

I understand that if we were to require other languages to create jobs records in the database, they would have a hard time creating such a message (anyone feel like reverse-engineering :erlang.binary_to_term/2 in, say, Python?). Likewise, if any other system had to read job data out of the database when it was encoded in this way, it would be a similar pain.

However, the alternative to send args that identify the module, function, and function arguments isn’t much better:

job_args = %{
   module: MyModule,
   fun: :something,
   fun_args: [input, opts]
}

The module name and the function atom convert to strings and can be converted back with a little tweaking, but the opts… those could be nearly anything… so … what to do?

The ability of having a worker that can run any function you throw at it is pretty tempting, and I think that flexibility may outweigh the cons of requiring Elixir to be on both ends of the pipeline.

If this wide-open flexibility of having carte-blanche captured functions is really an anti-pattern, then the only other way I can think to structure the worker is to have it have it operate on messages like this:

job_args = %{
   type: "something",
   input: input,
   opts: opts
}

and then in the worker it could do something like:

case type do
   "something" -> MyModule.something(input, opts)
end

i.e. the worker would need to know in advance what possibilities it should expect. In practice, there might only be a couple dozen.

However, even this approach would still fail the JSON encoding if the input were a struct or when the options were a keyword list.

I’m hoping someone can shed some light on this problem – maybe I’m not thinking about this the correct way. I understand that serializing certain things (pids or refs) is asking for trouble, but structs, modules, atoms, and functions seem pretty safe.

Thank you in advance for your thoughts!

ityonemo · July 3, 2021, 5:07am

Just note when you unpickle a lambda, it generally speaking won’t work unless the unpickler has the module that the lambda came from.

It’s basically an mfa itself. So unpickling a lambda into another language is gonna be a doozy.

dimitarvp · July 3, 2021, 8:17am

IMO your last option makes the most sense. All your code will have to know all possible ways to process data coming from the job queue anyway so what’s the problem with matching 10-20 hardcoded strings against their corresponding module/function calls? Don’t get too academic.

As for encoding / decoding data structures beyond what JSON can safely offer, have you entertained the idea of using FlatBuffers? The serialization there is version-aware and you can deprecate fields. FlatBuffers is one of the very few serialization formats I’ve known that can help you evolve your data structures mostly painlessly.

Using JSON is fine but there’s no point desperately holding on to it when it doesn’t seem that it can get the job done.

And finally, if you’re convinced you’re only going to have Elixir code process all the data and you are reasonably sure you won’t be changing those data structures often and you are sure you won’t be posting new versions while the job queue still has 50_000 messages with the old data to process… then indeed using :erlang.term_to_binary and :erlang.binary_to_term seems to be a no-brainer.

hassan · July 3, 2021, 3:55pm

Nope, not a problem. Google ‘external term format implementations’ for multiple specific (Python, Rust) and general references.

HTH!

fireproofsocks · July 3, 2021, 7:09pm

Wow. I had no idea this was a thing!

I had not heard of these either. Sounds like a useful tool!

Thanks for providing a sanity check! I think we are weighing the proper pros and cons here. It’s always possible to include a version number in the message somehow, so I think we’d have options available if our “interfaces” (i.e. behaviour implementations) changed between versions, and the use-cases right now are such that deleting old messages and re-running jobs is not a deal-breaker.

josevalim · July 3, 2021, 7:39pm

fireproofsocks:

Certain variables are known at the time when the job message is created, so it is possible to do something like the following:
job_args = %{
   captured_fn: fn -> MyModule.something(input, opts) end
          |> :erlang.binary_to_term()
          |> Base.encode64()
}
Serialization using :erlang.binary_to_term/2 and Base.encode64 is required so that Elixir functions/structs/atoms/etc may be safely inserted into the database as an Oban job because the args are JSON encoded.

Just to make it clear beyond doubt, if you serialize an anonymous function and then you do a new deploy of the system were said anonymous function is slightly changed (for example, it is one line down), then you will no longer be able to execute it. So this approach is absolutely a no-go, even if using only Elixir. I would say it is in general an anti-pattern, encoding the message type is much better.

fireproofsocks · July 4, 2021, 3:53am

Could you say more about this? Is the capture pointing to a line number somehow?

josevalim · July 4, 2021, 10:55am

Not necessarily. My point is that a unique name is generated for each anonymous function and the process is opaque. Renaming a private function to give it a clearer name, changing lines, etc could all affect the name of the anonymous function. You have no control over it and you should assume that any change will give you something new.

Here is an example:

iex(18)> defmodule Foo do
...(18)> def fun, do: fn a, b -> a + b end
...(18)> end
iex(19)> fun = Foo.fun()
#Function<0.51000596/2 in Foo.fun/0>
iex(20)> defmodule Foo do
...(20)> def fun, do: fn a,
...(20)> b -> a +
...(20)> b
...(20)> end
...(20)> end
warning: redefining module Foo (current version defined in memory)
  iex:20
iex(21)> Foo.fun() == fun
false

Same code, different line breaks, different functions.

fireproofsocks · July 5, 2021, 1:34am

Thank you for the detailed explanation!

And thank you for creating Elixir – thanks for making developing fun again.