Sending structured messages containing a binary member: what is copied?

I would like to pass on incoming file upload as separate chunks down a GenStage, but I would like some expert advice/confirmation on my thinking. I also would like to know of anything Elixir might handle differently.

When reading the binary chunks from the Plug connection, I want to create a tuple per chunk. The binary chunk is one of the message tuple fields. When a message is sent from one process to another, the message is copied and then linked into the mailbox of the target process according this and this page.

The documentation on Erlang binaries mention that the real binary data is handled off-heap, with a ProcBin wrapper on-heap.

So, if I create a tuple message containing a binary, am I right that only the tuple with the ProcBin wrapper is copied into the target mailbox and the real binary data is not?

Assume that everything is copied, unless you can guarantee that the receiver is on the same Node.

If they are on the same Node the binheap will be used for binaries larger than a threshold, but I’m not sure how large this threshold is.

@NobbZ I can guarantee that the receiver is on the same Node as I am the one starting the aformentioned GenStage when a new file upload is triggered on the Node.

If I look at the documentation (see link in initial post), I see a difference between Heap binaries and Refc binaries on a threshold of 64 bytes. Is that the threshold you are referring to?

While that section is of course correct it does manage to hide some of the basics. So small binaries, less than 64 bytes, are stored on the process heap and are copied when sent in messages, while large binaries, greater than 64 bytes, are store off the process heaps in the refc memory and are not copied when sent in messages. This is completely transparent to the user. It also means that Erlang is usable for streaming large amounts of data and has been used for video streaming applications.

There are a few caveats though:

  • Seeing binaries in the refc memory can be referenced by many processes it can take a longer time for then to be garbage collected which can cause problems.
  • If you make a sub-binary, however small, of a large binary in refc memory then you will keep a reference to the large binary which means that it will not be collected. See binary:copy for a solution.

Everything is a trade-off. :wink:

I prefer to think of refc memory as the large binary space.

8 Likes

@rvirding is there any way I can get notified at runtime when refc binaries are copied? Log messages, profiler, … ?

Additionally, starting in OTP 20, “literals” are not copied. Literals are any constant compile-time values - lists, tuples, strings, maps, etc. All components of a tuple/list/map have to be literals as well, so [1, 2] is a literal, while [1, foo] is not.

2 Likes