I am working on a project using Phoenix that receives requests for data, makes many external API calls, and returns the received data after some formatting. For each incoming request I’d like to make ~30+ requests to an external service.
I am hoping to accomplish this by making my external API calls concurrently, but I am running into CPU bottlenecks. Currently, I am kicking off each request in a task, awaiting all of the tasks, and formatting + merging the received data. This greatly reduces my response time, but comes at a cost of cpu/scheduler utilization. I’m seeing cpu usage spike to 20-50% momentarily when trying to make 100+ outbound requests in a short period of time.
This is more psuedocode than reality, but here’s a snippet that demonstrates roughly how I am making my external requests & gathering the responses:
You are enqueuing all the requests in one instant in your loop, then it’s expected that the system will run as much as possible concurrently. If I had enough memory I would like the CPU spike to be at 100%, to handle as much work as it can the faster it can.
If you want to smooth things down you can use Task.async_stream and tweak the :max_concurrency option to limit concurrent jobs, at the cost of higher response time.
Finch uses connection pools, so you could use that to limit the concurrent connections as well, but I am not sure how that translates to control concurrent requests. That should be available in Req as well, as it uses Finch under the hood.
That is a great point. I am seeing the speed up that I expected and the tradeoff is cpu usage. However, my concern is that I’m seeing what I feel is abnormally high cpu usage compared to the number of requests I’m receiving (and therefore sending to the external api).
I’m seeing huge spikes from less than 10 incoming requests (10 * 30 = 300 outgoing requests), and I’d like to scale this to support a much higher number of incoming requests. Obviously, I could throw money at the problem and get a beefier cpu, but I’m trying to determine if there is something I’m missing here. Ideally, I wouldn’t have to scale hardware so early in the testing phase.
Also, the response size is rather small, but the requests can take 100-300ms (maybe even 500ms). Not sure if that would have any impact on cpu if a few of the concurrent requests take longer than others to complete.
You are basically awaiting each task one by one instead of doing Task.await_many or, as @lud said and it’s even better: use Task.async_stream and have its max_concurrency be less than or equal to the maximum size of your Finch pool.
Not sure if that’s going to bring down the CPU load by a lot but it will help somewhat + you will be doing even better parallelization.
After you apply the very useful suggestions people before me gave, you can think of caching in-memory and through HTTP. First one can be accomplished with ETS and the second by leveraging Cache-Control and ETag response headers.
I’m willing to bet there’s something wrong with your application code (or instrumentation! I’ve made that mistake before). I work on a service that will make hundreds of downstream requests and we definitely don’t see any big CPU bumps, so something seems off
Also I don’t know what kind of resource allocation your application has available in terms of CPU or memory, is it seriously constrained? Still, I wouldn’t expect a 20%-50% bump. Could it be the way you’re manipulating or merging the data? Though that seems pretty unlikely given the payload size.
In short, something fishy is going on! You might need to try doing some profiling. I’m guessing you can reproduce it locally? The Observer is a great tool as well if you’re not already using it.
There are a few things that you might think about, however they don’t explain the CPU bump:
Make sure you’re passing SSL certs in as a file, here’s an example in HTTPoison but similar options exist in Finch/Mint/etc. If you’re making enough downstream requests it can affect memory usage.
We switched from HTTPoison to Finch and found a big improvement in terms of reliability and speed, looks like you’ve tried it already, just wanted to endorse it
Your code example makes me think you could simplify it with Task.async_stream (although I would also consider Task.Supervisor.async_stream or Task.Supervisor.async_stream_no_link, depending on your needs). I’m pretty sure async_stream matches your use case exactly, but maybe I scanned your post too fast!