Acceptance of Erlang's 'term_to_binary' and vice versa in Elixir


#10

I’m not arguing against that. My comments were on the topic of people using databases instead of file based persistance directly from elixir/erlang.


#11

And we are just giving away information about why that mantra is obviously correct, but also why especially because of its truthiness we want to use t2b. Please do not take it as an offense, its just that you were the one who actually spoke it out loud.


#12

From my perspective I think it is known as I’ve seen it mentioned a few times in the books and courses I’ve done - pretty sure @sasajuric mentions it a few times in Elixir in Action.

However, what I don’t think is well known is what you said in your other post - that it could well be a much better way to store certain kind of data.

Norbert has split your post into a dedicated thread so I’ll add your original post as a quote to your post above.


#14

Yeah, I’ve used :erlang.term_to_binary to implement a very simple ad-hoc database. My main motivation in the book was to keep things simple. Using a full-blown database would have required installation of some piece of software, introduction of mix project and OTP application, and addition of another dependency. I definitely didn’t want to deal with all that in the chapter which explains GenServer :smiley:

I also occasionally reach for term_to_binary in real life, for some simple nice-to-have short-term persistency. I think it’s a great no-ceremony, no-impedance-mismatch fit for such scenarios.

We almost ended up storing encoded terms to PostgreSQL in one case, but we decided against it, since we were worried about possible future changes to the format. I saw somewhere (can’t remember where though), that the format rarely changes, but that it can still happen.


#15

The phoenix long poll transport for channels uses term_to_binary to encode the long poll server pid and send it back to the client. When they repoll, we binary_to_term back into a pid to ask the server if it has any messages for us, which has been a fun way to use these features :slight_smile:


#16

Agreed! At Aircloak we have two Elixir systems chatting over the socket connection (detailed explanation is in this post). For a long time, we just shipped JSONs over the wire, but at some point we noticed that encoding takes a long time for large payload. After some measurements, we replaced it with term_to_binary/binary_to_term, since it was much faster (even faster than jiffy). As an added bonus there is no impedance mismatch between the data being exchanged :slight_smile:


#17

These functions existed in Kernel ~4 years ago and were removed here probably to make the standard library more concise: https://github.com/elixir-lang/elixir/issues/2003

I agree with your proposal term_to_binary and binary_to_term should be re-added to Kernel but also understand it’s tough to know where to draw the line in regards to the breadth of the stdlib… especially auto-imported Kernel functions.


#18

One should take care, though, when using binary_to_term on data received from the network. The deserialization itself can create resources that are limited in the system, thus leading to DoS (atoms being the primary element). There’s also the issue that it allows for a gzip compression of the data, so it is potentially susceptible to a zip-bomb attack.

While both of the functions aren’t often used directly in the applications, they are used quite frequently in libraries. On top of my head from stuff we use in our production app would be Phoenix.Token and I’m pretty sure there are some other examples.


#19

Would have liked a safe option to binary_to_term that was indeed safe. The documentation says it can be used when receiving binaries from an untrusted source. However someone showed me that at least in Elixir this can be used for bad things regardless. Can’t remember the specifics though :/. Does anyone know why binary_to_term is not considered safe even though documentation hints it can be used with the safe option?

Deserialization can be problematic in many formats. XML and Yaml both suffer from DoS attacks. (https://en.wikipedia.org/wiki/Billion_laughs).

This is especially true if you are using erlang records.


#20

Sorry, I don’t get your meaning here. Records haven’t changed format in the about 25 years since they were added to the language. They haven’t changed syntax, Erlang syntax at least, in that time either. They are partly (some say mainly) my fault (so you know who to blame :wink:)


#21

The post you are replying to gave you two examples. You can DoS a server by filling it’s memory with atoms that cannot be garbage collected. Or a zip-bomb.


#22

Yes, but the 'safe` option guards against atom creation but I remember there is something “unsafe” about the safe option as well but can’t remember what

safe:
  Use this option when receiving binaries from an untrusted source.

  When  enabled, it prevents decoding data that can be used to attack the Erlang
  system. In the event of receiving unsafe data, decoding fails  with  a  badarg
  error.

  This prevents creation of new atoms directly, creation of new atoms indirectly
  (as they are embedded in certain  structures,  such  as  process  identifiers,
  refs,  and  funs),  and  creation of new external function references. None of
  those resources are garbage collected,  so  unchecked  creation  of  them  can
  exhaust available memory.

#23

Perhaps I have misunderstood what this is all about, which is quite likely :smiley:
Records implementation might not have changed but if you change the record definition and you have data at rest you need to be able to decode all records of all versions. This is somewhat the same problem as using records in header files.

It is manageable but you need to be aware of it and make sure to handle all “old” record versions.

I.e

-record(myrec, {a, b, c})

%% Later version
-record(myrec, {a, newfield, b, c}).


#24

Maybe what was considered “unsafe” about :safe in the resource you are remembering is that it raises an ArgumentError on new atoms rather than having a more graceful failure mechanism? Though to be fair attempting to create terms from binaries that don’t follow the external term format raises the same, safe or not.


#25

No, it was something else. It think it was something about being able to serialize functions and sneak thereby being able to call any function in the system. This was on an IRC discussion a while back where I though using binary_to_term using safe option could be used between client and server and someone gave a good example why this is not a good idea.


#27

It does indeed change on occasion, but it is always backwards compatible. Changes are on new ID’s in the internal format (which is very simple).

Standard versioning issues. Hence why anything that can persist I like to keep versioned as a first tag. ^.^

If you encode a function in it and call it, otherwise no.


#28

What do you think of storing data in a persistent Mnesia database, possibly with mnesia_eleveldb (presentation) as the backend, instead of either an external database or files? What makes it look attractive to me on paper compared to files is that you still get transactions and queries. I know Mnesia requires the user to deal with netsplits manually, but I am mainly interested in this question in the case when you run a single node.


#29

Are you referring to this: https://griffinbyatt.com/post/analysis-plug-security-vulns? I don’t think deserializing calls any embedded functions, but this blog post shows how they can subsequently be called unintentionally by enumeration.

Using Plug.Crypto.safe_binary_to_term filters out any embedded serialized functions.


#30

Yes, I think that was what I was referring to! Thanks for finding it.


#31

I have a question :slight_smile:

If my business rule is to receive input as JSON and output JSON, which way is faster:
json -> term -> binary and binary -> term -> json (one file per user)
Or
json -> jsonb (postgresql) and jsonb -> json (one row per user)

ps; I will benchmark at some point but for now my app has not existed yet :stuck_out_tongue: