Acceptance of Erlang's 'term_to_binary' and vice versa in Elixir

Yes, but the 'safe` option guards against atom creation but I remember there is something “unsafe” about the safe option as well but can’t remember what

safe:
  Use this option when receiving binaries from an untrusted source.

  When  enabled, it prevents decoding data that can be used to attack the Erlang
  system. In the event of receiving unsafe data, decoding fails  with  a  badarg
  error.

  This prevents creation of new atoms directly, creation of new atoms indirectly
  (as they are embedded in certain  structures,  such  as  process  identifiers,
  refs,  and  funs),  and  creation of new external function references. None of
  those resources are garbage collected,  so  unchecked  creation  of  them  can
  exhaust available memory.
3 Likes

Perhaps I have misunderstood what this is all about, which is quite likely :smiley:
Records implementation might not have changed but if you change the record definition and you have data at rest you need to be able to decode all records of all versions. This is somewhat the same problem as using records in header files.

It is manageable but you need to be aware of it and make sure to handle all “old” record versions.

I.e

-record(myrec, {a, b, c})

%% Later version
-record(myrec, {a, newfield, b, c}).

2 Likes

Maybe what was considered “unsafe” about :safe in the resource you are remembering is that it raises an ArgumentError on new atoms rather than having a more graceful failure mechanism? Though to be fair attempting to create terms from binaries that don’t follow the external term format raises the same, safe or not.

2 Likes

No, it was something else. It think it was something about being able to serialize functions and sneak thereby being able to call any function in the system. This was on an IRC discussion a while back where I though using binary_to_term using safe option could be used between client and server and someone gave a good example why this is not a good idea.

2 Likes

It does indeed change on occasion, but it is always backwards compatible. Changes are on new ID’s in the internal format (which is very simple).

Standard versioning issues. Hence why anything that can persist I like to keep versioned as a first tag. ^.^

If you encode a function in it and call it, otherwise no.

2 Likes

What do you think of storing data in a persistent Mnesia database, possibly with mnesia_eleveldb (presentation) as the backend, instead of either an external database or files? What makes it look attractive to me on paper compared to files is that you still get transactions and queries. I know Mnesia requires the user to deal with netsplits manually, but I am mainly interested in this question in the case when you run a single node.

2 Likes

Are you referring to this: https://griffinbyatt.com/post/analysis-plug-security-vulns? I don’t think deserializing calls any embedded functions, but this blog post shows how they can subsequently be called unintentionally by enumeration.

Using Plug.Crypto.safe_binary_to_term filters out any embedded serialized functions.

7 Likes

Yes, I think that was what I was referring to! Thanks for finding it.

1 Like

I have a question :slight_smile:

If my business rule is to receive input as JSON and output JSON, which way is faster:
json -> term -> binary and binary -> term -> json (one file per user)
Or
json -> jsonb (postgresql) and jsonb -> json (one row per user)

ps; I will benchmark at some point but for now my app has not existed yet :stuck_out_tongue:

1 Like

Probably this I’d wager. Databases can be a lot faster than filesystems depending on access pattern, in addition the term/binary conversion is not as fast as most people would think.

A benchmark would be interesting to see, but you’d have to well document and test various versions of the database, the filesystems, and the storage medium.

2 Likes