Yes, but the 'safe` option guards against atom creation but I remember there is something “unsafe” about the safe option as well but can’t remember what
safe:
Use this option when receiving binaries from an untrusted source.
When enabled, it prevents decoding data that can be used to attack the Erlang
system. In the event of receiving unsafe data, decoding fails with a badarg
error.
This prevents creation of new atoms directly, creation of new atoms indirectly
(as they are embedded in certain structures, such as process identifiers,
refs, and funs), and creation of new external function references. None of
those resources are garbage collected, so unchecked creation of them can
exhaust available memory.
Perhaps I have misunderstood what this is all about, which is quite likely
Records implementation might not have changed but if you change the record definition and you have data at rest you need to be able to decode all records of all versions. This is somewhat the same problem as using records in header files.
It is manageable but you need to be aware of it and make sure to handle all “old” record versions.
I.e
-record(myrec, {a, b, c})
%% Later version
-record(myrec, {a, newfield, b, c}).
Maybe what was considered “unsafe” about :safe in the resource you are remembering is that it raises an ArgumentError on new atoms rather than having a more graceful failure mechanism? Though to be fair attempting to create terms from binaries that don’t follow the external term format raises the same, safe or not.
No, it was something else. It think it was something about being able to serialize functions and sneak thereby being able to call any function in the system. This was on an IRC discussion a while back where I though using binary_to_term using safe option could be used between client and server and someone gave a good example why this is not a good idea.
What do you think of storing data in a persistent Mnesia database, possibly with mnesia_eleveldb (presentation) as the backend, instead of either an external database or files? What makes it look attractive to me on paper compared to files is that you still get transactions and queries. I know Mnesia requires the user to deal with netsplits manually, but I am mainly interested in this question in the case when you run a single node.
Are you referring to this: https://griffinbyatt.com/post/analysis-plug-security-vulns? I don’t think deserializing calls any embedded functions, but this blog post shows how they can subsequently be called unintentionally by enumeration.
Using Plug.Crypto.safe_binary_to_term filters out any embedded serialized functions.
If my business rule is to receive input as JSON and output JSON, which way is faster: json -> term -> binary and binary -> term -> json (one file per user)
Or json -> jsonb (postgresql) and jsonb -> json (one row per user)
ps; I will benchmark at some point but for now my app has not existed yet
Probably this I’d wager. Databases can be a lot faster than filesystems depending on access pattern, in addition the term/binary conversion is not as fast as most people would think.
A benchmark would be interesting to see, but you’d have to well document and test various versions of the database, the filesystems, and the storage medium.