Are there any best practices or rules of thumb for writing asynchronous code that manages database resources using Ecto?
For instance, when I’m manipulating a database record in a task should I pass the record to the task or should I pass the id and query the record within the task?
Anything else that might be good to know when working with Ecto (or any external system) asynchronously?
Generally speaking, it is always better to load the data in the task to avoid copying between processes. But if you already have the data in the parent process, then sending it to the task should not be a major problem unless you have large binaries, since those are refcounted.
What does this mean? If you already have large binaries, is there another option to get them into your task? Isn’t refcounting a benefit here since the value of the binary doesn’t need to be copied across the process boundary? In terms of message passing overhead, large binaries should have lower overhead than medium-sized tuples. Right?
There’s some awkward interaction with garbage collection: http://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-lovehate
There’s low overhead sending them between processes but may cause an OOM crash in the worst cases.
That article was posted 3 years ago. Not sure what if anything has changed in OTP 20, but I was sending raw TCP data through a basic GenStage flows at a rate 500MB/s over the weekend. These were ~2KB binaries. No crashes, VM memory usage was stable at around 50MB.
Spinning up :observer to watch the VM took up as much memory as the stream non-processing itself.
Yeah it’s never been an issue in my personal experience either.
My rough understanding is it presents when you have long lived processes that don’t otherwise generate enough garbage to trigger a collection.