I was wondering if there was a more efficient way to perform several independent HTTP requests in parallel than this:
{time, results} = :timer.tc(fn ->
payloads
|> Task.async_stream(
fn payload -> make_request(url, payload) end,
max_concurrency: System.schedulers_online(),
timeout: 300_000
)
|> Enum.to_list()
end)
It’s what I have right now and I’m not getting very good performance out of it (my Python + asyncio + aiohttp program that performs a similar test is consistently beating this) so I suspect I’m doing something wrong… For the actual make_request method I’m just using the client Req (Req.post
and Finch for specifying a connection pool of 100). For further reference, payloads isn’t a ginormous size or anything. It should contain 5000 elements for this test (because in production we’ll probably be sitting between 3000-5000).
It’s hard to imagine this code being inefficient, I mean, network requests are several orders of magnitude slower than what the CPU can do.
Maybe passing around the payload slows it down? But I don’t see much ways on further improving that.
Hmm, what could be an alternative to passing around the payload?
At first I thought maybe the Enum.to_list() part could be causing the slow down (since Idk that my Python code is doing that exactly the same), but I suppose it’s a necessary part for getting a “finite” result I can inspect out of the stream ![:thinking: :thinking:](https://elixirforum.com/images/emoji/apple/thinking.png?v=12)
May not be appropriate in the long run if you actually need to result from make_request/2
but you could use Stream.run
to eliminate the potential confounding from Enum.to_list
.
Have you tried increasing the concurrency? You’ve got a pool of 100 connections you said but the concurrency will just be your scheduler count which is a lot less than 100. If the only thing the pool is used for then I would expect concurrency to be 100. Also passing the order: false option if you don’t care about result order.
4 Likes
I have a feeling this is probably what’s going on. Not sure what scheduler count is, but if it’s directly related to number of cores, yes it will indeed be significantly lower than the max number of 100 connections in the pool. Will definitely try this out. Thanks!
2 Likes
Yes by default there are 2 schedulers per core. This is a fine choice for a CPU bound task, but you are IO bound so I would aim for parity with your pool size.