Best pattern/approach for parallelization of I/O bound requests?

Hi there! Newcomer to the language (Ruby/JS/Go dev by trade) wondering what the “Elixir way” is to best achieve something.

For context:

  • Geocoding microservice with input as an array/list of zip code strings (e.g. [“11201, US”, “10036, US”]) that transforms said zip codes into an array of geocoded hashes/objects via a third party API (in our case Mapquest). This service accommodates other inputs but for the purposes of this discussion assume that that’s the only kind input we can receive.

  • We usually receive a request to transform fewer than five zip codes at a time, but have seen as many as 4089. Mapquest limits our batching opportunity to 100 in a single request, but we’ve yet to run into problems with rate limiting re: those batched requests. What I’m looking for is a way to take the strings, transform them into Mapquest-valid JSON, send paralleized requests in batches of 100, combine the results of those requests back into a list, and send the complete list back as a single response.

Genstage and async tasks both seem like ways of achieving this (my hunch is that there are more “primitive” ways as well), but I’m not sure what would be best. Thanks for the advice!

Bit busy so cannot do a full response, but for that purpose it seems like GenStage is practically designed for it, so maybe start by reading that intro? :slight_smile:

1 Like

I’ve taken a look at genstage but was under the assumption that it might be an “overengineered” solution to my problem because I don’t have to deal with backfilling.

Smells like what Task.async and Task.await were made for.

No, it is still perfectly suited for it. :slight_smile:

Not necessarily, those can easily overwhelm the system if too many jobs, plus he said it needs to be in batches of a max of 100, and I am ‘guessing’ that he cannot submit too many 100 batch jobs either.

OP indicated that Mapquest doesn’t seem to rate limit at the level they’re using it, and that huge batches were rare. So a Task async/await loop sounds like a simple and “good enough” solution (until you do run out of memory, that’s early enough IMO to haul out the Serious Engineering tools ;-)). Given that GenStage usually comes with Very Long Articles, I’d be hesitant to apply it to something as simple as OP presents, unless there are compelling reasons for the additional complexity.

GenStage is actually really easy, plenty of help on these forums too. :slight_smile:

Thanks for the advice everyone. Will give GenStage a shot.

https://hexdocs.pm/elixir/Task.html#async_stream/3 ?

Just a general curiosity, are parallelization and concurrency assumed the same in this context?