GenServer #PID<0.7281.0> terminating tcp_closed error messages

Phoenix app in a Docker container is filling log with this in production on an Azure Web App.

Another post suggested hackney and httpoison is causing it.

How can I find the cause of the errors?

I can ssh to the container and run /app/bin/app_name remote to start iex.

Do I need to create a ssh tunnel to remote debug the node?

In the container I can run: /app/erts-13.2.2/bin/epmd -names to find the remote port
epmd: up and running on port 4369 with data:
name app_name at port 40699

2023-08-27T11:00:31.165568384Z 11:00:31.165 [error] GenServer #PID<0.7281.0> terminating

2023-08-27T11:00:31.165607985Z ** (stop) :tcp_closed

2023-08-27T11:00:31.165614385Z Last message: {:tcp_closed, #Port<0.1783>}

2023-08-27T11:00:44.382799422Z 11:00:44.382 [error] GenServer #PID<0.7294.0> terminating

2023-08-27T11:00:44.382851424Z ** (stop) :tcp_closed

2023-08-27T11:00:44.382857724Z Last message: {:tcp_closed, #Port<0.1787>}

2023-08-27T11:00:50.868232946Z 11:00:50.867 [error] GenServer #PID<0.7306.0> terminating

2023-08-27T11:00:50.868293847Z ** (stop) :tcp_closed

2023-08-27T11:00:50.868302347Z Last message: {:tcp_closed, #Port<0.1791>}

2023-08-27T11:00:55.510507526Z 11:00:55.509 [error] GenServer #PID<0.7318.0> terminating

2023-08-27T11:00:55.510555327Z ** (stop) :tcp_closed

2023-08-27T11:00:55.510560527Z Last message: {:tcp_closed, #Port<0.1795>}

I did :sys.trace pid("0.277.3"), true to get this log from Debugging an unexpected process exit - #4 by hkrutzer

I got the process id using Behold.process_by_memory from Elixir Processes: Observability - Samuel Mullen

2023-08-28T04:42:49.047272402Z: [INFO]  ** (stop) :tcp_closed
2023-08-28T04:42:49.047283902Z: [INFO]  Last message: {:tcp_closed, #Port<0.34294>}
2023-08-28T04:42:49.053868256Z: [INFO]  *DBG* <0.282.3> got {'EXIT',<0.283.3>,
2023-08-28T04:42:49.053900657Z: [INFO]                          {{badmatch,nil},
2023-08-28T04:42:49.053906857Z: [INFO]                           [{'Elixir.Tds.Tls','assert_connected!',1,
2023-08-28T04:42:49.053911557Z: [INFO]                                [{file,"lib/tds/tls.ex"},{line,86}]},
2023-08-28T04:42:49.053916558Z: [INFO]                            {'Elixir.Tds.Tls',send,2,
2023-08-28T04:42:49.053921158Z: [INFO]                                [{file,"lib/tds/tls.ex"},{line,41}]},
2023-08-28T04:42:49.053925458Z: [INFO]                            {tls_sender,send_tls_alert,2,
2023-08-28T04:42:49.053929858Z: [INFO]                                [{file,"tls_sender.erl"},{line,489}]},
2023-08-28T04:42:49.053934358Z: [INFO]                            {tls_sender,connection,3,
2023-08-28T04:42:49.053938858Z: [INFO]                                [{file,"tls_sender.erl"},{line,277}]},
2023-08-28T04:42:49.053959159Z: [INFO]                            {gen_statem,loop_state_callback,11,
2023-08-28T04:42:49.053963759Z: [INFO]                                [{file,"gen_statem.erl"},{line,1426}]},
2023-08-28T04:42:49.053968459Z: [INFO]                            {proc_lib,init_p_do_apply,3,
2023-08-28T04:42:49.053973059Z: [INFO]                                [{file,"proc_lib.erl"},{line,240}]}]}}
2023-08-28T04:42:49.053977759Z: [INFO]  *DBG* <0.282.3> new state {state,
2023-08-28T04:42:49.053981959Z: [INFO]                             {<0.282.3>,tls_dyn_connection_sup},
2023-08-28T04:42:49.053986259Z: [INFO]                             one_for_all,
2023-08-28T04:42:49.053990459Z: [INFO]                             {[receiver],
2023-08-28T04:42:49.053994659Z: [INFO]                              #{receiver =>
2023-08-28T04:42:49.053998859Z: [INFO]                                 {child,<0.284.3>,receiver,
2023-08-28T04:42:49.054003560Z: [INFO]                                  {ssl_gen_statem,start_link,undefined},
2023-08-28T04:42:49.054008560Z: [INFO]                                  temporary,true,5000,worker,
2023-08-28T04:42:49.054013260Z: [INFO]                                  [ssl_gen_statem,tls_connection,
2023-08-28T04:42:49.054017960Z: [INFO]                                   tls_connection_1_3]}}},
2023-08-28T04:42:49.054023060Z: [INFO]                             undefined,0,3600,[],0,any_significant,
2023-08-28T04:42:49.054027860Z: [INFO]                             tls_dyn_connection_sup,[]}
2023-08-28T04:42:49.054032760Z: [INFO]  *DBG* <0.282.3> got {'EXIT',<0.284.3>,{shutdown,normal}}
2023-08-28T04:42:50.308597299Z: [INFO]  04:42:50.308 [error] GenServer #PID<0.264.3> terminating
2023-08-28T04:42:50.308645800Z: [INFO]  ** (stop) :tcp_closed
2023-08-28T04:42:50.308653600Z: [INFO]  Last message: {:tcp_closed, #Port<0.34293>}
2023-08-28T04:42:50.315401459Z: [INFO]  *DBG* <0.277.3> got {'EXIT',<0.278.3>,
2023-08-28T04:42:50.315452760Z: [INFO]                          {{badmatch,nil},
2023-08-28T04:42:50.315460460Z: [INFO]                           [{'Elixir.Tds.Tls','assert_connected!',1,
2023-08-28T04:42:50.315465160Z: [INFO]                                [{file,"lib/tds/tls.ex"},{line,86}]},
2023-08-28T04:42:50.315470260Z: [INFO]                            {'Elixir.Tds.Tls',send,2,
2023-08-28T04:42:50.315475360Z: [INFO]                                [{file,"lib/tds/tls.ex"},{line,41}]},
2023-08-28T04:42:50.315480460Z: [INFO]                            {tls_sender,send_tls_alert,2,
2023-08-28T04:42:50.315484861Z: [INFO]                                [{file,"tls_sender.erl"},{line,489}]},
2023-08-28T04:42:50.315489961Z: [INFO]                            {tls_sender,connection,3,
2023-08-28T04:42:50.315494761Z: [INFO]                                [{file,"tls_sender.erl"},{line,277}]},
2023-08-28T04:42:50.315499561Z: [INFO]                            {gen_statem,loop_state_callback,11,
2023-08-28T04:42:50.315518161Z: [INFO]                                [{file,"gen_statem.erl"},{line,1426}]},
2023-08-28T04:42:50.315522561Z: [INFO]                            {proc_lib,init_p_do_apply,3,
2023-08-28T04:42:50.315526662Z: [INFO]                                [{file,"proc_lib.erl"},{line,240}]}]}}
2023-08-28T04:42:50.315539562Z: [INFO]  *DBG* <0.277.3> new state {state,
2023-08-28T04:42:50.315546362Z: [INFO]                             {<0.277.3>,tls_dyn_connection_sup},
2023-08-28T04:42:50.315549562Z: [INFO]                             one_for_all,
2023-08-28T04:42:50.315552462Z: [INFO]                             {[receiver],
2023-08-28T04:42:50.315555262Z: [INFO]                              #{receiver =>
2023-08-28T04:42:50.315558262Z: [INFO]                                 {child,<0.279.3>,receiver,
2023-08-28T04:42:50.315561262Z: [INFO]                                  {ssl_gen_statem,start_link,undefined},
2023-08-28T04:42:50.315564362Z: [INFO]                                  temporary,true,5000,worker,
2023-08-28T04:42:50.315567362Z: [INFO]                                  [ssl_gen_statem,tls_connection,
2023-08-28T04:42:50.315570363Z: [INFO]                                   tls_connection_1_3]}}},
2023-08-28T04:42:50.315573363Z: [INFO]                             undefined,0,3600,[],0,any_significant,
2023-08-28T04:42:50.315576163Z: [INFO]                             tls_dyn_connection_sup,[]}
2023-08-28T04:42:50.315579563Z: [INFO]  *DBG* <0.277.3> got {'EXIT',<0.279.3>,{shutdown,normal}}
2023-08-28T04:42:53.071126125Z: [INFO]  04:42:53.070 [error] GenServer #PID<0.309.3> terminating

Logger.configure(level: :debug, truncate: :infinity) in iex

2023-08-28T05:43:46.619701740Z 05:43:46.603 [error] GenServer #PID<0.3123.0> terminating
2023-08-28T05:43:46.619757642Z ** (stop) :tcp_closed
2023-08-28T05:43:46.619767042Z Last message: {:tcp_closed, #Port<0.350>}
2023-08-28T05:43:46.619783042Z State: %Tds.Tls{socket: #Port<0.350>, ssl_opts: [cacerts: [{:cert, <<48, 130, 3, 148, 48, 130, 
%Tds.Tls{
  socket: Port<0.124>,
  ssl_opts: [cacerts: 
    [{:cert, <<48, 130, 3, 148, 48, 130, 2, 124, 160, 3, 2, 1, 2, 2, 10, 49, 245, 228, 98, 12, 108, 88, 237, 214, 216, 48, 13, 6, 9, 42, 134, 72, 134, 247, 13, 1, 1, 11, 5, 0, 48, 103, 49, 11, ...>>],
    customize_hostname_check: [match_fun: Function<41.3316493/2 in :erl_eval.expr/6>], 
    verify_fun: {&:ssl_verify_hostname.verify_fun/3, []}, 
    verify: :verify_peer, 
    active: false, 
    cb_info: {Tds.Tls, :tcp, :tcp_closed, :tcp_error}
  ],
  owner_pid: PID<0.2478.0>, handshake?: false, buffer: nil
}

How many connections are you opening? Azure has a limit on the amount of outgoing connections you can have. You need to check the limit for your instance size. You should also have a look around in the “Diagnose and solve problems” menu, it sometimes actually contains useful data.

Personally the next steps I would take are:

1 Like

It’s just a new Phoenix app with the Repo configured to a SQL Server on Azure SQL. It’s running in a Docker container on Azure Web App or Web Service for Linux.

There should be no users on it. The errors are logged less frequently if I set the pool_size to 1 instead of 10. Otherwise, the tcp_closed errors are logged multiple times per minute forever.

ELIXIR_VERSION=1.14.5
OTP_VERSION=25.3.2.5
DEBIAN_VERSION=bullseye-20230612-slim

You are getting less errors because there are less database connections that are getting closed. I would still try to connect with the TDS driver directly, maybe even from your own machine to the Azure DB.

1 Like

I’m able to connect to the Azure SQL database and run queries, but intermittently I get these errors.

2023-08-29T14:54:56.513337330Z ** (exit) an exception was raised:
2023-08-29T14:54:56.513342430Z     ** (DBConnection.ConnectionError) bad return value: {:error, :closed}
2023-08-29T14:54:56.513348130Z         (db_connection 2.5.0) lib/db_connection.ex:1286: DBConnection.handle_common_result/3
2023-08-29T14:54:56.513353330Z         (db_connection 2.5.0) lib/db_connection.ex:1410: DBConnection.run_prepare/4
2023-08-29T14:54:56.513359630Z         (db_connection 2.5.0) lib/db_connection.ex:1422: DBConnection.run_prepare_execute/5
2023-08-29T14:54:56.513365530Z         (db_connection 2.5.0) lib/db_connection.ex:1527: DBConnection.run/6
2023-08-29T14:54:56.513386731Z         (db_connection 2.5.0) lib/db_connection.ex:656: DBConnection.parsed_prepare_execute/5
2023-08-29T14:54:56.513392331Z         (db_connection 2.5.0) lib/db_connection.ex:648: DBConnection.prepare_execute/4
2023-08-29T14:54:56.513397631Z         (tds 2.3.2) lib/tds.ex:70: Tds.query/4
2023-08-29T14:54:56.513402231Z         (ecto_sql 3.10.1) lib/ecto/adapters/sql.ex:955: Ecto.Adapters.SQL.struct/10
  defp handle_common_result(return, conn, meter) do
    case return do
      {:ok, result, _conn_state} ->
        {:ok, result, meter}

      {:error, err, _conn_state} ->
        {:error, err, meter}

      {:disconnect, err, _conn_state} ->
        disconnect(conn, err)
        {:error, err, meter}

      {:catch, kind, reason, stack} ->
        stop(conn, kind, reason, stack)
        {kind, reason, stack, meter}

      other ->
        bad_return!(other, conn, meter)
    end
  end

  @compile {:inline, bad_return!: 3}

  defp bad_return!(other, conn, meter) do
    try do
      raise DBConnection.ConnectionError, "bad return value: #{inspect(other)}"
    catch
      :error, reason ->
        stack = __STACKTRACE__
        stop(conn, :error, reason, stack)
        {:error, reason, stack, meter}
    end
  end

I was able to connect to the database with Tds on iex in the container and run queries. I get the DBConnection.ConnectionError from above, 1 out of 100 times.

Maybe try to execute some queries from a container, but using mssql-cli?

I changed the pool size to 1 and narrowed the TCP close errors to about every 90 to 100 seconds.

I need to check if DBConnection is keeping the TCP connection alive or if Docker and Debian close connections in that time.

The application makes two TCP connections to the database on two different IP and ports. One connects on port 1433 and the other connects to a random port like 11061.

I have the same Docker container running on my local and on Azure.

On my local, traffic on both ports are constant every couple of seconds.
The packet length from 1433 to my local alternates between 103 and 81.

On the Azure container, traffic on the random port is constant every couple of seconds and the TCP traffic on port 1433 is every 30 seconds.
The packet length from 1433 to Azure container is always 1.

// connects
15:09:23.934147 IP (tos 0x0, ttl 64, id 34719, offset 0, flags [DF], proto TCP (6), length 60)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [S], cksum 0xc611 (incorrect -> 0x726f), seq 2492617959, win 64240, options [mss 1460,sackOK,TS val 3402923976 ecr 0,nop,wscale 7], length 0
15:09:23.943066 IP (tos 0x0, ttl 115, id 33344, offset 0, flags [DF], proto TCP (6), length 52)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [S.], cksum 0xbbfa (correct), seq 561512862, ack 2492617960, win 65535, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
15:09:23.943094 IP (tos 0x0, ttl 64, id 34720, offset 0, flags [DF], proto TCP (6), length 40)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc5fd (incorrect -> 0xfac3), seq 1, ack 1, win 502, length 0
15:09:23.943376 IP (tos 0x0, ttl 64, id 34721, offset 0, flags [DF], proto TCP (6), length 87)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [P.], cksum 0xc62c (incorrect -> 0xca06), seq 1:48, ack 1, win 502, length 47
15:09:23.948163 IP (tos 0x0, ttl 115, id 33345, offset 0, flags [DF], proto TCP (6), length 83)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [P.], cksum 0x268b (correct), seq 1:44, ack 48, win 16425, length 43
15:09:23.948185 IP (tos 0x0, ttl 64, id 34722, offset 0, flags [DF], proto TCP (6), length 40)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc5fd (incorrect -> 0xfa69), seq 48, ack 44, win 502, length 0
15:09:23.957238 IP (tos 0x0, ttl 64, id 34723, offset 0, flags [DF], proto TCP (6), length 349)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [P.], cksum 0xc732 (incorrect -> 0x6382), seq 48:357, ack 44, win 502, length 309
15:09:23.963137 IP (tos 0x0, ttl 115, id 33346, offset 0, flags [DF], proto TCP (6), length 4136)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [P.], cksum 0xd5fd (incorrect -> 0xcb2a), seq 44:4140, ack 357, win 16423, length 4096
15:09:23.963139 IP (tos 0x0, ttl 115, id 33349, offset 0, flags [DF], proto TCP (6), length 649)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [P.], cksum 0x4d1c (correct), seq 4140:4749, ack 357, win 16423, length 609
15:09:23.963227 IP (tos 0x0, ttl 64, id 34724, offset 0, flags [DF], proto TCP (6), length 40)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc5fd (incorrect -> 0xe6d4), seq 357, ack 4749, win 501, length 0
15:09:23.969178 IP (tos 0x0, ttl 64, id 34725, offset 0, flags [DF], proto TCP (6), length 206)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [P.], cksum 0xc6a3 (incorrect -> 0xa13d), seq 357:523, ack 4749, win 501, length 166
15:09:23.975267 IP (tos 0x0, ttl 115, id 33350, offset 0, flags [DF], proto TCP (6), length 99)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [P.], cksum 0xdbc7 (correct), seq 4749:4808, ack 523, win 16423, length 59
15:09:23.975870 IP (tos 0x0, ttl 64, id 34726, offset 0, flags [DF], proto TCP (6), length 401)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [P.], cksum 0xc766 (incorrect -> 0x1533), seq 523:884, ack 4808, win 501, length 361
15:09:23.980933 IP (tos 0x0, ttl 115, id 33351, offset 0, flags [DF], proto TCP (6), length 294)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [P.], cksum 0x4089 (correct), seq 4808:5062, ack 884, win 16421, length 254
15:09:24.023148 IP (tos 0x0, ttl 64, id 34727, offset 0, flags [DF], proto TCP (6), length 40)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc5fd (incorrect -> 0xe38c), seq 884, ack 5062, win 501, length 0
15:09:24.023735 IP (tos 0x0, ttl 115, id 33352, offset 0, flags [DF], proto TCP (6), length 294)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [P.], cksum 0x4089 (correct), seq 4808:5062, ack 884, win 16421, length 254
15:09:24.023747 IP (tos 0x0, ttl 64, id 34728, offset 0, flags [DF], proto TCP (6), length 52)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc609 (incorrect -> 0x40bb), seq 884, ack 5062, win 501, options [nop,nop,sack 1 {4808:5062}], length 0

// 30 seconds since connecting
15:09:54.040202 IP (tos 0x0, ttl 115, id 33353, offset 0, flags [DF], proto TCP (6), length 41)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [.], cksum 0xa55c (correct), seq 5061:5062, ack 884, win 16421, length 1
15:09:54.040226 IP (tos 0x0, ttl 64, id 34729, offset 0, flags [DF], proto TCP (6), length 52)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc609 (incorrect -> 0x3fbe), seq 884, ack 5062, win 501, options [nop,nop,sack 1 {5061:5062}], length 0

// 60 seconds
15:10:24.055763 IP (tos 0x0, ttl 115, id 33354, offset 0, flags [DF], proto TCP (6), length 41)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [.], cksum 0xa55c (correct), seq 5061:5062, ack 884, win 16421, length 1
15:10:24.055781 IP (tos 0x0, ttl 64, id 34730, offset 0, flags [DF], proto TCP (6), length 52)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc609 (incorrect -> 0x3fbe), seq 884, ack 5062, win 501, options [nop,nop,sack 1 {5061:5062}], length 0

// 90 seconds
15:10:54.071833 IP (tos 0x0, ttl 115, id 33355, offset 0, flags [DF], proto TCP (6), length 41)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [.], cksum 0xa55c (correct), seq 5061:5062, ack 884, win 16421, length 1
15:10:54.071853 IP (tos 0x0, ttl 64, id 34731, offset 0, flags [DF], proto TCP (6), length 52)
    [app_ip].56076 > [AZURESQLIP].1433: Flags [.], cksum 0xc609 (incorrect -> 0x3fbe), seq 884, ack 5062, win 501, options [nop,nop,sack 1 {5061:5062}], length 0
    
// connection drops
15:11:21.269931 IP (tos 0x0, ttl 115, id 33356, offset 0, flags [DF], proto TCP (6), length 40)
    [AZURESQLIP].1433 > [app_ip].56076: Flags [R.], cksum 0xe57d (correct), seq 5062, ack 884, win 0, length 0

I think the other port is the one you’re connecting from on the client?

Both ports connect to different Azure SQL IP address. The application only connects to one SQL Server, but it creates two connections. I guess it’s part of the protocol.

The local container only connects to 1433.

The Azure container connects to 1433 and another port.

Does TDS need to anything for Connectivity from within Azure?

Connectivity from within Azure
If you are connecting from within Azure your connections have a connection policy of Redirect by default. A policy of Redirect means that after the TCP session is established to Azure SQL Database, the client session is then redirected to the right database cluster with a change to the destination virtual IP from that of the Azure SQL Database gateway to that of the cluster. Thereafter, all subsequent packets flow directly to the cluster, bypassing the Azure SQL Database gateway. The following diagram illustrates this traffic flow.

Connectivity from outside of Azure
If you are connecting from outside Azure, your connections have a connection policy of Proxy by default. A policy of Proxy means that the TCP session is established via the Azure SQL Database gateway and all subsequent packets flow via the gateway. The following diagram illustrates this traffic flow.

My connection to the SQL server doesn’t disconnect after I changed the SQL server Connection Policy from Default to Proxy.

The TCP traffic on port 1433 on the Azure container is constant every 2 seconds like my local container.

The Azure container only connects to 1433 after the change.

Maybe it will help if you can look at the decrypted traffic. That would require some work but I’m not sure how else to find out more. It’s the Erlang -- ssl option on SSL. Some stuff is linked from Support SSLKEYLOGFILE environment variable. · Issue #132 · sneako/finch · GitHub that might show you how to implement it.

When the Azure container is connected to the SQL server using the Redirect connection policy, it doesn’t need the connection on port 1433. It uses the other port in range of 11000-11999 to communicate with the database.

When the connection on port 1433 closes, it also closes the connection on port 11000.

TDS in deps\tds\lib\tds\tls.ex asserts the connection to 11000 when it calls send/2 for something.

  def send(socket, payload) do
    socket
    |> assert_connected!()
    |> GenServer.call({:send, payload})
  end

  # Asserts that the port / socket is still open and returns its `pid`
  defp assert_connected!(socket) do
    {:connected, pid} = Port.info(socket, :connected)
    pid
  end

Submitted PR to TDS

Are you confident that change isn’t going to result in processes sticking around when they shouldn’t?

I’m not confident. I’m hoping the maintainers can review the change.