Telemetry not working with postgrex ([:db_connection, :connection_error] event)

Hi folks,

I cannot make telemetry (version 1.0.0) work with postgrex (version 0.15.11). Per the db_connection doc (version 2.4.0), a telemetry event is generated when there is a connection error :

A [:db_connection, :connection_error] event is published whenever a connection checkout receives a %DBConnection.ConnectionError{} .

The output of my test is copy/pasted below and as you can see the “Connected to DB” and “Disconnected from DB” messages are displayed because I installed a connection listener with the :connection_listeners option (I am starting/stopping the PostgreSQL instance). I do not get the telemetry event I have attached to. Any idea to make it work ? TIA

$ iex -S mix
Erlang/OTP 24 [erts-12.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]
Interactive Elixir (1.12.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> {:ok, pid} = TestPg.start_link([])
{:ok, #PID<0.235.0>}
08:36:05.840 [error] Postgrex.Protocol (#PID<0.239.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (xxx:5433): connection refused - :econnrefused
08:36:17.848 [info]  Connected to DB
08:36:25.207 [error] Postgrex.Protocol (#PID<0.239.0>) disconnected: ** (Postgrex.Error) FATAL 57P01 (admin_shutdown) terminating connection due to administrator command
08:36:25.208 [error] Postgrex.Protocol #PID<0.239.0> could not cancel backend: tcp connect: connection refused - :econnrefused
08:36:25.209 [info]  Disconnected from DB
08:36:25.212 [error] Postgrex.Protocol (#PID<0.239.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (xxx:5433): connection refused - :econnrefused
08:36:35.158 [info]  Connected to DB
defmodule TestPg do

  require Logger

  use GenServer

  @db_host     "xxx"
  @db_port     5433
  @db_name     "test"
  @db_username "test"
  @db_password "dummy"

  defstruct [:pid, :connected]

  def start_link(args) do
    GenServer.start_link(__MODULE__, args, name: :db_test)
  end

  def init(_args) do
    send(self(), :connect)
    :ok = :telemetry.attach("dummy", [:db_connection, :connection_error], &TestPg.handle_event/4, nil)
    {:ok, %__MODULE__{connected: false}}
  end

  # def handle_event(event, measurements, metadata, _config) do
  #   IO.inspect event, label: "event"
  #   IO.inspect measurements, label: "measurements"
  #   IO.inspect metadata, label: "metadata"
  # end

  def handle_event([:db_connection, :connection_error], measurements, metadata, _config) do
    IO.inspect measurements, label: "measurements"
    IO.inspect metadata, label: "metadata"
    # IO.inspect config, label: "config"
  end

  def handle_info(:connect, state) do
    res = Postgrex.start_link(
      hostname: @db_host,
      port:     @db_port,
      database: @db_name,
      username: @db_username,
      password: @db_password,
      connection_listeners: [:db_test]
    )
    case res do
      {:ok, pid} ->
        {:noreply, %__MODULE__{state | pid: pid}}
      {:error, reason} ->
        Logger.error "Postgrex.start_link error for reason #{inspect(reason)}"
        {:stop, reason, state}
    end
  end

  def handle_info({:connected, _pid}, state) do
    Logger.info "Connected to DB"
    {:noreply, %__MODULE__{state | connected: true}}
  end

  def handle_info({:disconnected, _pid}, state) do
    Logger.info "Disconnected from DB"
    {:noreply, %__MODULE__{state | connected: false}}
  end

  def handle_info(msg, state) do
    Logger.warning "Unknown handle_info(): #{inspect(msg)}"
    {:noreply, state}
  end

  def handle_cast(msg, state) do
    Logger.warning "Unknown handle_cast(): #{inspect(msg)}"
    {:noreply, state}
  end

  def handle_call(request, _from, state) do
    Logger.warning "Unknown handle_call(): #{inspect(request)}"
    {:reply, nil, state}
  end

end

Well, this is embarassing, but I eventually got the telemetry event. It looks like I get it when I am trying to do a query on the database (an INSERT in this case) but not when I am idle and stop the database. I guess this is normal behavior since it is written “(…) whenever a connection checkout receives a %DBConnection.ConnectionError{} .” That means we have to need to checkout a connection from the pool and that only happens when we try to run a query.

Idle (no telemetry event):

11:02:20.448 [error] Postgrex.Protocol (#PID<0.242.0>) disconnected: ** (Postgrex.Error) FATAL 57P01 (admin_shutdown) terminating connection due to administrator command
11:02:20.449 [error] Postgrex.Protocol #PID<0.242.0> could not cancel backend: tcp connect: connection refused - :econnrefused
11:02:20.449 [info]  Disconnected from DB for pid: #PID<0.242.0>, state.pid: #PID<0.240.0>
11:02:20.454 [error] Postgrex.Protocol (#PID<0.242.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (xxx:5433): connection refused - :econnrefused

Trying to do an INSERT (transaction and query):

11:02:46.448 [error] Postgrex.Protocol (#PID<0.242.0>) disconnected: ** (Postgrex.Error) FATAL 57P01 (admin_shutdown) terminating connection due to administrator command
11:02:46.448 [error] Postgrex.Protocol #PID<0.242.0> could not cancel backend: tcp connect: connection refused - :econnrefused
11:02:46.453 [error] Postgrex.Protocol (#PID<0.242.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (xxx:5433): connection refused - :econnrefused
11:02:49.146 [error] Postgrex.Protocol (#PID<0.242.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (xxx:5433): connection refused - :econnrefused
measurements: %{count: 1}
metadata: %{
  error: %DBConnection.ConnectionError{
    message: "connection not available and request was dropped from queue after 2045ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:\n\n  1. Ensuring your database is available and that you can connect to it\n  2. Tracking down slow queries and making sure they are running fast enough\n  3. Increasing the pool_size (albeit it increases resource consumption)\n  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval\n\nSee DBConnection.start_link/2 for more information\n",
    reason: :queue_timeout,
    severity: :error
  },
  opts: []
}
catch transaction error: :error
catch transaction reason: %DBConnection.ConnectionError{
  message: "connection not available and request was dropped from queue after 2045ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:\n\n  1. Ensuring your database is available and that you can connect to it\n  2. Tracking down slow queries and making sure they are running fast enough\n  3. Increasing the pool_size (albeit it increases resource consumption)\n  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval\n\nSee DBConnection.start_link/2 for more information\n",
  reason: :queue_timeout,
  severity: :error
}
1 Like