Is there a recommended lib that provides async functions for collections?

andre1sk · June 7, 2016, 12:03pm

async.map
async.filter
etc?

or we have to rely on
Task.yield_many?

bbense · June 7, 2016, 5:30pm

Writing a naive version of this library is generally simple enough that everyone implements their own Pmap.

The complicated part is how you deal with errors and flow control and there have been a couple abortive attempts to implement general purpose “flow” programming Stream.async by the Elixir core team. These have uncovered many issues that make the generic problem difficult.

The currrent rethinking of a generic way to deal with this is GenBroker.

In the short term, since your application knows the appropriate recovery scheme, you’re probably best off implementing your own Pmap.

andre1sk · June 7, 2016, 5:46pm

Thank you for the info!

benwilson512 · June 7, 2016, 5:53pm

Keep in mind that especially if you’re coming from Javascript there’s a lot of stuff javascript does asynchronously merely to avoid blocking. In elixir however the BEAM’s preemptive scheduler obviates the need to do async stuff for that reason.

andre1sk · June 7, 2016, 6:02pm

My primary use case let’s run these n requests to a backend APIs at the same time and await until all are returned or race request to these 3 API end points etc. So it is somewhat similar to JS (Bluebird for example)
Promise.all
Promise.props
Promise.any
Promise.some
Promise.map
Promise.reduce
Promise.filter
Promise.each
Promise.race
It obviously can be done as is in Elixir but will be a bit more verbose.

Qqwy · June 7, 2016, 9:26pm

The reason that a library that does this doesn’t exist is twofold:

It is quite simple to create asynchronious collection-handling systems in Elixir.
The actual internals of such a system (such as what should happen when something goes wrong) is often very unique to your specific use-case. There is little to no support for custom fallback strategies in other language’s multithreading/asynchronious capabilities, but Elixir gives you this possibility.

Most of the systems that consume a collection do one of the following things:

Take a collection |> Map Task.async(&your_function(&1) or Task.async(YourModule, :your_function, [params]) over it with the function you want to execute asynchronously. |> maybe do some other things in the meantime |> map Task.await/1 on the collection until you have the answers.

If you’re not interested in the answers, take the collection and map (or plain iterate) with Task.start over them. Note that Task.start will start tasks that are not linked to the current process, so if one or multiple of them fail, the current process does not care.
Use a more sophisticated approach where you have work in a queue and you have a limited amount of workers (usually this amount is limited because of some external limit, such as the maximum of parallel db-connections, socket-connections, etc) that take an item from the queue whenever they are idle, process this work and then take the next item. If one of them crash, a new worker process is created. Optionally the item that made the worker crash is either retried or skipped.

This advanced pooling approach is made easier using libraries like for instance poolboy.

Qqwy · June 7, 2016, 9:29pm

If my understanding of promises is correct, you can very much treat tasks as promises:

First, you create the separate Task and tell it to fetch something that might take some time.
Then, you pass around the Tasks PID anywhere where the return value is not yet neccesary.
Finally, when you need the value that was promised, you call Task.await to wait for an answer.

andre1sk · June 7, 2016, 9:45pm

Thanx guys for all the great info, it’s very straightforward to accomplish using tasks or manually spawning processes, it just a bit more verbose then something like:
[‘endpoint1’, ‘endpoint2’, ‘endpoint3’] |>Async.race(&Blah.rpc_call/1) |>doSomething

uranther · June 8, 2016, 1:51am

Very helpful post! Thanks for breaking it down this way. I referred a fellow in IRC here and he found this great example of using poolboy (an Erlang module) with Elixir: https://github.com/thestonefox/elixir_poolboy_example

gregvaughn · June 8, 2016, 3:09pm

Can you explain what Async.race should do? I’m not familiar enough with modern JS to be sure. But based upon a guess, it’s probably not hard to write something like that as a utility function in your project.

andre1sk · June 8, 2016, 4:31pm

Well in ES6 it will “return” (reject or resolve) the value or error of the first Promise that resolves or rejects which is not that useful. So in JS terms I guess closest thing would Promise.any (from bluebird). The use case you call several endpoints with same request and return the first one that returns a value and drop the rest. Bonus points if there is a convention for canceling the remaining requests. It’s not hard at all, but to create a general purpose module I am guessing devil is in details as far as handling errors and various corner cases. Might be a useful module to build though.

gregvaughn · June 8, 2016, 4:52pm

Oh! The name race makes more sense now – “the first one wins”. I would recommend you not try to solve it in general on a first pass, but do it for the needs of your app specifically. I can kind of, in a hand-wavey way (I’m sure there are details to work through) see a high level approach.

create N Tasks under a supervisor, which will perform the endpoint call and send a message back to the manager
the manager goes into a receive block (with a timeout) and once the first message is received, tells the TaskSupervisor to die (perhaps in a controlled way to allow cancellation) which kills any remaining child tasks

It’s interesting to think about. However, stepping back, I don’t immediately see a use case for it. Why would you want to send the same request to multiple endpoints?

andre1sk · June 8, 2016, 5:04pm

Reduce avg. latency that’s how google often structures their stuff you do not want to wait until time-out to resubmit request to a different endpoint or to be waiting on reply from overloaded node. They require requests to be cancelable though as not too cause too much extra load. .Race is probably least used though others are used more often such as .All (you have a list of requests and get back a list of results in proper order) Task.yield_many works fine for this but you have to write more code etc…