We are using Elixir 1.7.4 with ecto 3.0.0 and OTP 19 with PostgreSQL 9.6 on Azure RDS
We face frequent db connection issues where the database just stops accepting connections after a while and the application becomes un-responsive. Some of the kinds of errors which are logged are given below,
GenServer Cluster.Strategy.Postgres.Notifications terminating
** (stop) %DBConnection.ConnectionError{message: “ssl async recv: closed”, severity: :error}
Last message: {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.121>, :tls_connection, :undefined}, [#PID<0.3474.0>, #PID<0.3473.0>]}}
State: %Postgrex.Notifications{idle_timeout: 5000, listener_channels: %{“cluster” => %{#Reference<0.3015757464.3877371905.67662> => #PID<0.3481.0>}}, listeners: %{#Reference<0.3015757464.3877371905.67662> => {“cluster”, #PID<0.3481.0>}}, parameters: nil, protocol: %Postgrex.Protocol{buffer: :active_once, connection_id: 585712, connection_key: 419216344, disconnect_on_error_codes: [], null: nil, parameters: #Reference<0.3015757464.3877371905.67659>, peer: {{191, 238, 6, 43}, 5432}, postgres: :idle, queries: nil, sock: {:ssl, {:sslsocket, {:gen_tcp, #Port<0.121>, :tls_connection, :undefined}, [#PID<0.3474.0>, #PID<0.3473.0>]}}, timeout: 15000, transactions: :naive, types: nil}}
[libcluster:titan] notify failed: %Protocol.UndefinedError{description: “”, protocol: String.Chars, value: %DBConnection.ConnectionError{message: “ssl send: closed”, severity: :error}}
Postgrex.Protocol (#PID<0.3470.0>) disconnected: ** (DBConnection.ConnectionError) ssl send: closed
GenServer #PID<0.3481.0> terminating
** (stop) exited in: :gen_server.call(Cluster.Strategy.Postgres.Notifications, {:listen, “sync”}, 5000)
** (EXIT) %DBConnection.ConnectionError{message: “ssl recv: closed”, severity: :error}
(stdlib) gen_server.erl:223: :gen_server.call/3
(titan) lib/libcluster/strategy/postgres/worker.ex:48: Cluster.Strategy.Postgres.Worker.handle_info/2
(stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:711: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: :sync
State: %Cluster.Strategy.State{config: [hostname: “titan.postgres.database.azure.com”, database: “titan”, username: “xxx”, password: “xxx”, port: 5432, ssl: true], connect: {:net_kernel, :connect_node, []}, disconnect: {:erlang, :disconnect_node, []}, list_nodes: {:erlang, :nodes, [:connected]}, meta: %{channel: “cluster”, connection: Cluster.Strategy.Postgres.Connection, notifications: Cluster.Strategy.Postgres.Notifications, subscription: #Reference<0.982578244.1168113666.181066>}, topology: :titan}
GenServer #PID<0.3481.0> terminating
** (stop) exited in: :gen_server.call(Cluster.Strategy.Postgres.Notifications, {:listen, “sync”}, 5000)
** (EXIT) no process: the process is not alive or there’s no process currently associated with the given name, possibly because its application isn’t started
(stdlib) gen_server.erl:223: :gen_server.call/3
(titan) lib/libcluster/strategy/postgres/worker.ex:48: Cluster.Strategy.Postgres.Worker.handle_info/2
(stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:711: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: :sync
State: %Cluster.Strategy.State{config: [hostname: “titan.postgres.database.azure.com”, database: “titan”, username: “xxx”, password: “xxx”, port: 5432, ssl: true], connect: {:net_kernel, :connect_node, []}, disconnect: {:erlang, :disconnect_node, []}, list_nodes: {:erlang, :nodes, [:connected]}, meta: %{channel: “cluster”, connection: Cluster.Strategy.Postgres.Connection, notifications: Cluster.Strategy.Postgres.Notifications, subscription: #Reference<0.3015757464.3877371905.67662>}, topology: :titan}
This has been happening too frequently. We are not sure of what is going wrong.Is it an issue with the version of software we are using (not updated since a while), connection pool settings or some other thing. Can someone guide on how we can debug and resolve such issues?