I’m working to isolate the issue in a reproducible way. At the moment, I only have firm reproduction steps for my company’s codebase, which isn’t that helpful for a bug report. The strange thing is that SASL seems to have a full stacktrace, since this is the output produced to the console:
[error] GenServer #PID<0.395.0> terminating
** (stop) exited in: :gen_server.call(MyApp.Redis.Pool, {:checkout, #Reference<0.0.2.176>, true}, 5000)
** (EXIT) no process
:erlang.send(MyApp.Redis.Pool, {:"$gen_cast", {:cancel_waiting, #Reference<0.0.2.176>}}, [:noconnect])
(stdlib) gen_server.erl:416: :gen_server.do_send/2
(stdlib) gen_server.erl:232: :gen_server.do_cast/2
(poolboy) src/poolboy.erl:58: :poolboy.checkout/3
(poolboy) src/poolboy.erl:74: :poolboy.transaction/3
(my_app) lib/my_app/redis/cache.ex:76: MyApp.Redis.Cache.delete/1
(verk) lib/verk/worker.ex:27: Verk.Worker.handle_cast/2
(stdlib) gen_server.erl:615: :gen_server.try_dispatch/4
Last message: {:"$gen_cast", {:perform, "MyApp.Workers.RedisCacheInvalidationWorker", [1341], "6d3d459df292e6e1611dc286", #PID<0.396.0>}}
State: nil
I’m going to see how far I can isolate it. Based on the trace above and tracing the failure through erlyberly, I’ve been trying to see if I can trigger it in isolation by sending messages to non-existing processes.
Quite honestly, if I didn’t see it happening with my own eyes, I would think it impossible. But the third party library we use for this codebase is definitely just calling System.stacktrace/0 and then crashing when it passes the value to Exception.format_stacktrace/1:
[error] Manager terminating, reason: {{%Protocol.UndefinedError{description: nil, protocol: Enumerable, value: {:gen_server, :call, [MyApp.Redis.Pool, {:checkout, #Reference<0.0.1.138>, true}, 5000]}}, [{Enumerable, :impl_for!, 1, [file: 'lib/enum.ex', line: 1]}, {Enumerable, :reduce, 3, [file: 'lib/enum.ex', line: 116]}, {Enum, :reduce, 3, [file: 'lib/enum.ex', line: 1486]}, {Enum, :map_join, 3, [file: 'lib/enum.ex', line: 1118]}, {Exception, :format_stacktrace, 1, [file: 'lib/exception.ex', line: 388]}, {Verk.QueueManager, :build_retry_job, 5, [file: 'lib/verk/queue_manager.ex', line: 135]}, {Verk.QueueManager, :handle_call, 3, [file: 'lib/verk/queue_manager.ex', line: 117]}, {:gen_server, :try_handle_call, 4, [file: 'gen_server.erl', line: 629]}]}, {GenServer, :call, [:"elixir.queue_manager", {:retry, %Verk.Job{args: [1341], class: "MyApp.Workers.RedisCacheInvalidationWorker", enqueued_at: 1462511589.706, error_backtrace: nil, error_message: nil, failed_at: nil, finished_at: nil, jid: "58592642725653573b8e8fdb", original_json: "{\"queue\":\"elixir\",\"class\":\"MyApp.Workers.RedisCacheInvalidationWorker\",\"args\":[1341],\"retry\":true,\"jid\":\"58592642725653573b8e8fdb\",\"created_at\":1462511589.705,\"enqueued_at\":1462511589.706}", queue: "elixir", retried_at: nil, retry_count: 0}, 1462511592, %RuntimeError{message: ":noproc"}, {:gen_server, :call, [MyApp.Redis.Pool, {:checkout, #Reference<0.0.1.138>, true}, 5000]}}, 5000]}}
All that is to say…working on something clearer cut that can be reproduced.