Asynchronous Ecto best practices

anthonator · November 8, 2017, 6:57pm

Are there any best practices or rules of thumb for writing asynchronous code that manages database resources using Ecto?

For instance, when I’m manipulating a database record in a task should I pass the record to the task or should I pass the id and query the record within the task?

Anything else that might be good to know when working with Ecto (or any external system) asynchronously?

josevalim · November 8, 2017, 7:31pm

Generally speaking, it is always better to load the data in the task to avoid copying between processes. But if you already have the data in the parent process, then sending it to the task should not be a major problem unless you have large binaries, since those are refcounted.

CptnKirk · November 8, 2017, 7:40pm

What does this mean? If you already have large binaries, is there another option to get them into your task? Isn’t refcounting a benefit here since the value of the binary doesn’t need to be copied across the process boundary? In terms of message passing overhead, large binaries should have lower overhead than medium-sized tuples. Right?

mbuhot · November 8, 2017, 10:07pm

There’s some awkward interaction with garbage collection: http://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-lovehate

There’s low overhead sending them between processes but may cause an OOM crash in the worst cases.

CptnKirk · November 8, 2017, 10:19pm

That article was posted 3 years ago. Not sure what if anything has changed in OTP 20, but I was sending raw TCP data through a basic GenStage flows at a rate 500MB/s over the weekend. These were ~2KB binaries. No crashes, VM memory usage was stable at around 50MB.

Spinning up :observer to watch the VM took up as much memory as the stream non-processing itself.

mbuhot · November 9, 2017, 12:21am

Yeah it’s never been an issue in my personal experience either.
My rough understanding is it presents when you have long lived processes that don’t otherwise generate enough garbage to trigger a collection.