I have a production web application, that handles ~1000rps.
We use Cowboy, which creates a process per connection.
For each request we make a synchronous call to a 3rd party library to fetch the cache (cachex) which errors every so often, making that request 500.
I want to isolate the connection process from 3rd party libraries, so If my cache library crashes, I won’t 500.
In this situation as I see it there are 3 options:
Use try-catch blocks for misbehaving calls to the library
Use a single Genserver to interact with the library
Option 3 could introduce bottlenecks into the application, as the calls to the library happen every request.
Option 2 means that for each connection we create another process which is created and destroyed (like in [1]). Moreover communication between processes requires copying of the message [2], wouldn’t this have performance ramifications?
Options 1 isn’t the “OTP” way but I can seem to find a good reason why not to use it, there’s no risk of performance implications, from bottlenecks or copying. However in documentation it says that try-catch / try-rescue is rarely used because OTP patterns are used instead. [3]
To me using try-catch seems like the way to go. However it seems to fly in the face of everything that I’ve read. In what situations is it best to use try-catch, and is this one of them?
If you use cowboy 2, it spawns a process per request, not just per connection, so it’s already isolated.
Other than that, whether to try/catch or not depends largely on the error you are dealing with. I think posting the errors you are facing would lead to better suggestions / answers.
Hi thank you for replying, this is the error I’m getting:
Task #PID<0.32063.7888> started from #PID<0.26981.7902> terminating
** (stop) exited in: GenServer.call(:cache_locksmith, {:transaction, ["9e323b9335689400e1249ccb8a80bd84"], #Function<0.107038532/0 in Cachex.Actions.Touch.execute/3>}, :infinity)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(elixir 1.11.3) lib/gen_server.ex:1017: GenServer.call/3
(cachex 3.3.0) lib/cachex.ex:1296: Cachex.touch/3
(my_app 0.2.0) lib/my_app/cache/local.ex:23: MyApp.Cache.Local.fetch/2
Sorry what I mean to say is I want to isolate my request process, I did read this [1] but as you say there is one request per connection + one process per request and response. Basically I don’t want a cache error to bring down a request.
The application itself is a normal web server returning a webpage with text/image responses, I have a cache in front because of the volume of traffic it gets. The logs show this error only shows up about 10 times a week, so not very often.
I’m thinking more about what’s the best way to handle a 3rd party library playing up. try-catch or Tasks? Would the copying of the responses from the processes Cachex -> Task -> request process make my application slower?
In answer to your questions:
When a call to cache fails, we should carry on and produce a response as normal
We could retry, but events like this happen so infrequently its not worth it
If many calls fail at once, we should just produce responses for all of those as normal
In this case I’d investigate why the genserver process is not started, maybe there is a race condition during application startup where the web endpoint is started before the cache.
Would the copying of the responses from the processes Cachex -> Task -> request process make my application slower?
You can benchmark Cachex against Cachex -> Task with your data with benchee to find out. It would be slower since more work is involved, but by how much is hard to say, can be 1% can be more.