TL;DR: is it OK in elixir to pass large (MBytes) binary strings through function parameters ? If not, what would be a clean way to transfer this binary to the end user in a Plug HTTP response ?
Context: My program reads binary data from storage using a NIF, in a format that Elixir does not know about. Basically, this binary string has to be delivered to the end user through HTTP request. It can get quite big (10s of MBytes) and it works in my demonstrator by passing the binary string through the return of the NIF function and then passed along to the caller function which would then send it to the client through Plug.Connection.send/2
case MSeed.NIF.stream_data(conn.assigns.datafiles) do
{:ok, msrecords} -> send_data(conn, msrecords)
end
And the send_data/2 function prepares the HTTP result (resp headers) and sends the content of msrecords, like:
Plug.Conn.send_resp(conn, 200, msrecords)
What are the impacts of passing large data through function parameters ? Are there other better ways to do this ?
Large binaries (> 64 byte or so) are passed by reference. Small binaries by value.
Only exception, the small binarie is a substring of a large binarie, in that case a fat pointer is passed, hindering the original binary from garbage collection as long as the substring is “alive”, thats why many slicing operations have a :copy option.
The only “expensive” action is to send them over a network.
If we are talking about large nested data structures (maps + lists) then sending them across processes might already be expensive, as this involves a deep copy!
Binaries are the only datatype that exists on a “shared” heap.
Additionally, these shared references are refcounted by processes that have ever accessed them.
So it is safe and cheap to pass them through function parameters, and safe and cheap to send them between processes, but dangerous to handle them in long-running processes as the long-lived process will cause the reference to be held, filling up RAM with your large binary, which cannot be reclaimed without manual intervention. At least not until every process that has ever handled them has died. For a recent thread on this phenomenon with detection, remediation, and other tips check out Agent keeps large binary memory despite no state .
If the process reading from your NIF is short-lived, you will be fine; if it’s a GenServer you may want to spin up a worker process on demand to do the actual handling to prevent memory bloat. Phoenix uses short-lived processes per-request so that process boundary should be fine to handle large binaries.