I’m playing with Elixir - It’s fun. I think @rvirding does give Elixir courses these days.
Re: files and database - when I given Erlang courses I say the three best things about erlang are processes, links and term_to_binary and the same should be true of Elixir (since these have nothing to do with the surface syntax and are properties of the beam VM)
I may be wrong but I don’t seem much use of term_to_binary in Elixir - this is the great way to store anything on disk or communicate between systems. Is this a well know way of storing and retrieving data to disk?
The post above has been split into a new thread, for context, here is Joe’s original post that this conversation stems from:
I’m not sure if this is well known in elixir world, but I had to sidestep over erlang after starting with elixir, because my targeted system had only OTP 16 available then and Elixir needed 17 at least. So I was using erlang exclusively for a year then. I learned about those nice little helpers back then.
But from elixir its hard to find them, since they are not available in the auto-imported modules, you have to know they are there and you have to explicitely import/remote call them, either as import :erlang, only: [binary_to_term: 1, term_to_binary: 1] or :erlang.term_to_binary("foo").
I think if they were documented and autoinlined from Kernel, their discoverability in elixir were much better and they would be used more then.
When I teach I go on an on and on about the greatness of term_to_binary and the inverse. These are incredibly useful.
They are also blindingly fast compared to any JSON/XML type serialisation.
Something like:
defmodule Term do
def store(anything, path) do
bin = :erlang.term_to_binary(anything)
File.write!(path, bin)
end
def fetch(path) do
File.read!(path) |> :erlang.binary_to_term
end
end
I made a proposal on elixir-core, it wont make it in 1.6 though, José once said he plans to release it early in January and I think I’ve read somewhere that he is already preparing its release… Anyway, the proposol needs to get accepted first, then implementation should be straight forward…
At least in my head there’s also always that mantra of “the filesystem is slow”, which of corse is not that big an issue if the filesystem is not read for each request. So besides discoverability it’s probably also a case of educating/informing people.
Of course, the filesystem is slow, but storing and loading in binary format straight into a file might be much faster than doing the same with JSON or stringified in an arbitrary database protocol.
Also, at least for me, filesystem does often just mean a dedicated area of memory, aka ram-drive. Its easier to share filenames on a ram drive with external processes than doing everything via stdin/out in a port. Sometimes I do even have applications external to the beam that read those files.
Also, when you want to persist you need to write to the filesystem, either you do it from the beam directly, or you push your data to a database, which will then persist to disk as well…
So somewhere in the process of persisting data you will always hit the disk.
I don’t know if I find that to be a great reason to not expose term_to_binary, seeing as it’s not actually only for writing to disk. It’s just a serialization of any erlang term, so you can use it on the wire as well.
Oh, and especially as filesystem and network is slow, I strongly prefer a binary serialisation format over a human readable plaintext format like JSON or XML (at least when I do not expect humans to read the output).
term_to_binary has even built in compression, so binarified chunks of data are much smaller than the same term serialised to JSON or XML.
And we are just giving away information about why that mantra is obviously correct, but also why especially because of its truthiness we want to use t2b. Please do not take it as an offense, its just that you were the one who actually spoke it out loud.
From my perspective I think it is known as I’ve seen it mentioned a few times in the books and courses I’ve done - pretty sure @sasajuric mentions it a few times in Elixir in Action.
However, what I don’t think is well known is what you said in your other post - that it could well be a much better way to store certain kind of data.
Norbert has split your post into a dedicated thread so I’ll add your original post as a quote to your post above.
Yeah, I’ve used :erlang.term_to_binary to implement a very simple ad-hoc database. My main motivation in the book was to keep things simple. Using a full-blown database would have required installation of some piece of software, introduction of mix project and OTP application, and addition of another dependency. I definitely didn’t want to deal with all that in the chapter which explains GenServer
I also occasionally reach for term_to_binary in real life, for some simple nice-to-have short-term persistency. I think it’s a great no-ceremony, no-impedance-mismatch fit for such scenarios.
We almost ended up storing encoded terms to PostgreSQL in one case, but we decided against it, since we were worried about possible future changes to the format. I saw somewhere (can’t remember where though), that the format rarely changes, but that it can still happen.
The phoenix long poll transport for channels uses term_to_binary to encode the long poll server pid and send it back to the client. When they repoll, we binary_to_term back into a pid to ask the server if it has any messages for us, which has been a fun way to use these features
Agreed! At Aircloak we have two Elixir systems chatting over the socket connection (detailed explanation is in this post). For a long time, we just shipped JSONs over the wire, but at some point we noticed that encoding takes a long time for large payload. After some measurements, we replaced it with term_to_binary/binary_to_term, since it was much faster (even faster than jiffy). As an added bonus there is no impedance mismatch between the data being exchanged
I agree with your proposal term_to_binary and binary_to_term should be re-added to Kernel but also understand it’s tough to know where to draw the line in regards to the breadth of the stdlib… especially auto-imported Kernel functions.
One should take care, though, when using binary_to_term on data received from the network. The deserialization itself can create resources that are limited in the system, thus leading to DoS (atoms being the primary element). There’s also the issue that it allows for a gzip compression of the data, so it is potentially susceptible to a zip-bomb attack.
While both of the functions aren’t often used directly in the applications, they are used quite frequently in libraries. On top of my head from stuff we use in our production app would be Phoenix.Token and I’m pretty sure there are some other examples.
Would have liked a safe option to binary_to_term that was indeed safe. The documentation says it can be used when receiving binaries from an untrusted source. However someone showed me that at least in Elixir this can be used for bad things regardless. Can’t remember the specifics though :/. Does anyone know why binary_to_term is not considered safe even though documentation hints it can be used with the safe option?
Sorry, I don’t get your meaning here. Records haven’t changed format in the about 25 years since they were added to the language. They haven’t changed syntax, Erlang syntax at least, in that time either. They are partly (some say mainly) my fault (so you know who to blame )
The post you are replying to gave you two examples. You can DoS a server by filling it’s memory with atoms that cannot be garbage collected. Or a zip-bomb.