I have been reading Elixir in Action and I have noticed that in an example, the author decides to forgo Tasks and use pools.
The reasoning
The reasoning behind this is that since we were dealing with a Database, and since the DB may be overrun by multiple simultaneous queries, we need to limit its access somehow via a pool.
Thus, the idea I get from the book is that if we have a resource we want to keep from overusing, we should use a pool, otherwise, if our resource is practically unlimited, we should go for Tasks since they are much simpler.
A personal case
As an example of this lets assume I have a service A that needs to notify another service B by sending 2000 requests/second to B (which has no problem in handing it).
In this specific case, would it make sense to use a pool of processes in A for it to use as senders to B?
Opinions?
Since I don’t have any resource here that needs to be safeguarded and processes are so cheap to create I don’t think a pool would make sense.
I personally think you should always limit the concurrency. I tend to use pools specifically when the resource is expensive to create and it can be reused. Database connections fit well into that category. Pools can be used to limit concurrency, but it doesn’t need to be done with a pool. For example, you can use tasks to limit concurrency. See Task.async_stream/3 with the :max_concurrency option.
As you mentioned pools are useful as a backpressure mechanism. You can increase and decrease the pool size dynamically to allow downstream services a chance to heal if they start throwing back errors. This is generally a more robust solution than something like circuit breakers which is more akin to opening a walnut with a sledge hammer. No service has infinite resources so I prefer to use pools when I can for these reasons.
Another benefit of pools is if the initial start-up costs of the worker are expensive. This could be because you need to spin up an external process or connect to a tcp socket. In the case you describe (and setting aside the fact that hackney pools have some pretty substantial issues) its more effective to open a connection and re-use that connection for multiple requests.
Now this is interesting. How would I re-use the connection for multiple requests? This is something that would in fact improve the example I mentioned, but I don’t know how to do it with HTTPotion (the library I am using).
If the server is using HTTP 1.1 or above, then the connections should be kept alive by default. Do note however that different web servers have different timeouts for those connections though.
If so, the first thing I’d personally consider is batching, because I sometimes find that sending X requests at once results in much less processing than X time sending one request, and that can really do wonders for performance/stability.
Other than that, pooling + queueing + shedding can be used to control the load, i.e. to make sure that the target service doesn’t get overloaded. More recently I tend to bypass popular pooling libraries (poolboy & friends), and instead use my own parent library which I find has the good balance of flexibility/simplicity for these kinds of challenges.
If so, the first thing I’d personally consider is batching, because I sometimes find that sending X requests at once results in much less processing than X time sending one request, and that can really do wonders for performance/stability.
The problem with batching/queuing is that when implemented it automatically means your application is statefull. What happens to all those queued up requests if the node they are on breaks? Well, you loose them.
By not using them, you only lose the requests that are occurring now meaning your application is stateless, which means it is easier to scale (which is the hidden objective here).
While you are correct that using batching/queuing would result in a performance and stability boost, it would also mean our application would be harder to scale because each node would now have state. We want our nodes to be as dummy as possible, thus why I ask about Tasks VS pools. Either would allow for a stateless app.
I think there’s a few really important points to solidify here:
The first is that even if you’re running a Task for each request, those requests are still going to be queued in some fashion on the BEAM while they wait to be scheduled and executed. If you start 1000 tasks and the beam crashes you lost potentially 1000 requests. So I’m not sure stateless vs. stateful is a good comparison to draw here, mostly because its not an accurate comparison. A potentially more appropriate decision is to decide if you’re going to send messages “at most once” or “at least once”. The current semantics you’ve drawn up in the crashing scenario are “at most once”. That might be fine for your use case but its meaningful to make that decision consciously.
A second, and maybe more important point, is that while its potentially useful to imagine that the downstream service has infinite resources and infinite scale out, the harsh reality is that neither of those claims are true. Services have limits, networks have limits, and both of them can experience faults. This is why it’s generally useful to implement batching and pooling. You minimize the overhead required to service the number of requests you have.
Saying that the batching and queueing makes it harder to scale your app just isn’t an accurate statement, especially when you consider the system as a whole. We can think of these 2 systems as a combination of queues. Requests are queued in the first service, sent to the downstream service, the downstream service needs to handle them. It can do some of this work in parallel but given that it’s a stable queue there will always be work in progress, waiting in a queue somewhere. If the downstream service has a slowdown for whatever reason then our total time spent in the system queue (the first service queue and the second service queue combined) increases. This will cause slowdown throughout the whole system and can easily overwhelm your upstream service. There are a bunch of ways of handling this (load shedding, back-pressure, etc.).
One of the ways we’ve had the most success is to use pools for our communication to downstream services and dynamically increase and decrease the number of works available in the pool based on the number of good responses were getting. If we start overwhelming the downstream service we dramatically lower the number of workers and start load-shedding (or return cached good responses) in the upstream service. This allows our downstream services time to heal before we start sending even more traffic their way.
All of these techniques are more about allowing for more overhead in your services and gracefully handling transient failures. While its totally possible (and maybe even reasonable) to spawn a Task per request, depending on your specific needs and guarantees around message delivery you may want to consider a more robust solution.
First I want to leave this here: Queues Don't Fix Overload
I found this to be quite a nice resource not only on how to handle overload.
“requests occurring now” is essentially queueing. It’s just less explicit because we usually think of each web request as it’s own thing and cowboy does it for us. Like if each web request calls queues one request to a downstream service you essentially loose the same amount of requests with the extra queue in there or not.
Requests occurring now are request currently being made by any number of processes. A queue, by default, only has 1 active request while the others wait their turn. It is the difference between making 1 request per second and having 9 waiting their turn and making 10 simultaneous requests with no wait in between them.
I am aware that if the node crashes, in both scenarios I lose 10 requests, that is true. But it’s still not queuing. When a Node crashes, you invariantly lose some information, this cannot be avoided. The difference is if you only want to lose the information you are currently processing, or the information you are currently processing + the information you got 10 minutes ago (which happens with queues).
As for how cowboy it isn’t important for this discussion as no one uses it…
Maybe. But this is not something I care because the downstream service, for all purposes and effects, when compared to the system I am developing, has infinite resources.
Another point in which we disagree. Systems with state are notoriously harder to scale. But please, don’t buy my word, have a look at the 12 factor app standard:
If I was concerned with the response time of the receiving system, I would probably be using Flow with backpressure. However, for the purposes of this discussion, I am not
You make good points, but I am still not convinced that adding state to this specific system would result in a system easier to scale.
Maybe I should backtract to the original question here.
One reason why I introduced the pool in Elixir in Action was precisely to warn the readers about a potential overload when unlimited concurrency is used. So the message you should take from the book is that it’s definitely good to think about overload scenarios. However, I don’t mean to imply that pooling is the only, or the best option in all such cases.
Which brings me to my comment about batching. I made that comment from the standpoint of load control (which is what IMO pooling is also about). The reason why I introduced this approach to the discussion was to show that there are other ways of controlling the load.
I agree with others here that both approaches are effectively stateful. If BEAM or the underlying machine goes down, whatever you’re doing in-flight will be lost.
It is, however, true that with batching you end up with possibly larger crash effects. If the queueing process crashes, you might lose more in-flight data, compared to a single task. Likewise, if the processing of the batched items fails on the consumer side, you might end up with more failed requests. This is a trade-off of the batching approach, and something you need to account for when considering it.
Either way, I don’t see any general reasons why each technique (or any other load control technique) would prevent scaling (particular reasons might of course exist in concrete scenarios). The statefulness of the queue is local, so you can e.g. still have multiple batching queues spread across multiple machines, just like you can have multiple pools spread across multiple machines.
In any case, all my comments were made from the standpoint of the load control, not scaling. So pooling in the book is used to control the load (it’s also used for teaching purposes, as a fairly simple but realistic concurrent challenge). The batching is mentioned in this thread for the same reason.
Best example might be ecto, which I’ve never heard to make problems with scaling, but is using pooling for it’s db connections.
No matter how much you want to request things at the same time you’ll always be bound to the number of processing units doing work in parallel. Otherwise you will queue requests at one place or another. If it’s a global queue or a pooled solution, somewhere your request will need to wait if there are not more parallel resources available to handle the request.
I’m very familiar the 12 factor app and the tradeoffs of stateful and stateless services. But you’re misunderstanding me here (or I’m not conveying my point well). You have state whether you want to use a queue or a bunch of tasks. That state is ephemeral in both cases. But it exists in both cases. Using a pool does not make your service any more or less “stateful” than it was before.
None of my points care at all about response times. It’s about fault tolerance and not overwhelming your downstream system.
I think you may be conflating worker queues with a more general usage of the term queue. It turns out that all computer systems can be modeled as queues. The BEAM uses a bunch of them for scheduling work. So regardless of whether or not you’re using a queue to manage your work directly, the problem you’re describing is a queueing problem. You just have several queues being processed concurrently. Think of it like tellers at a bank.
The consequence of this statement is that you can model your entire system (the two services communicating) as one giant queue and any increase in wait time or processing time in the downstream service will be felt all the way back at the upstream service.
I’m not sure I understand this but if you’re using plug or phoenix you’re probably using cowboy under the hood. Either way the larger point still stands. It’s queues all the way down.
You have the same amount of state (in-flight requests) in either scenario.
Have you considered adding batching at a finer granularity than 10 minutes? I would think that if you batched the requests you get every second together you could drastically reduce the required resources on your elixir service and the network from your elixir service to your upstream service. It sounds like you have enough scale that batching by ever second (or even every 500ms) you could scale your elixir service more effectively without significantly increasing the amount of information that would be lost if the node crashes.