Hey,
I’ve been brainstorming how to solve this challenge for several days and I’d really like your guys input.
I need to ‘process’ 20,000 events per minute. To process 1 event, I need to hit 5 separate HTTP API endpoints (xml), wait for them to all return 200 OK (or timeout, or fail), then parse body of only the 200 code XML responses to find the ‘winner’, craft a message, and move on to the next batch.
These events are coming from a GenStage producer, and they will likely be in batches of ~50 because the consumer has a max_demand
of 100
.
My plan was (loosely…) to:
-
Take a look at the # of events (lets say batch #1 is
50
) -
Use
Task.async
(or something) to start 5 processes which are in charge of doing 50 requests each with buoy, usingKeep-Alive
and http pipelining (thank you @idi527). Sadly, these endpoints don’t support HTTP/2 so I’m having to squeeze every drop of performance out of HTTP/1.1. -
So 5 endpoints * 50 requests each = 250 requests per batch * 400 batches = 100,000 requests per minute overall request volume (20K new events get generated by the producer every minute so all this I’m explaining will need to keep up with that).
-
Finally, some kind of
Task.await
and zip all 5 responses lists together so that I can loop through and find the ‘winners’ of each list…
Something like this:
[
[a1, b1, c1, d1, e1],
[a2, b2, --timeout--, d2, e2],
[--500 response--, --timeout--, d3, e3, f3],
[a4, b4, c4, d4, --422 response--],
... etc up to 50
]
QUESTIONS
Sooo, I am still a beginner at elixir, but I think buoy makes requests syncronously: https://github.com/lpgauth/buoy
So that means it’s not enough to just make 50 requests in a single Task.async
. I think there needs to be a new Task.async
for EVERY request… similar to how one might do this:
(1..50) |> Enum.each(fn (n) ->
spawn(fn -> :buoy.get(url, timeout: 750) end)
end)
Now I doubt Task.async
could work like that…(I’m still reading). I imagine each of the 5 tasks would instead spawn 50 more processes each… I just have no idea how the logistics of all this fit together.
The reason I wanted to use Task
was so I didn’t have to keep track of all the responses flowing in, I could just wait for the buoy timeouts to expire. It’s like I need “another level” deeper of asynchronousness though. On top of this, the # of data API’s might change, so I don’t think I should hardcode that at 5. Can I dynamically chain Task
with |>
, kind of like ruby’s call
method to call function names from variables (I have a list of feeds in memory I can loop over)?
Any suggestions?