Handling Phoenix Presence during testing

nickewing · March 22, 2017, 4:53pm

I have run into an issue while testing a simple Phoenix channel and using Phoenix.Presence.

It looks to me like the tests completes and the Presence process notices that a user has disconnected as a result. It then attempts to fetch the new user list but at that point the database connection has already been closed for that test. Is there any way to handle this sort of situation?

I tried calling Presence.unlink from within the test but was unable to get any other results.

I’m seeing the following error:

09:32:22.414 [error] Task #PID<0.580.0> started from MyApp.Presence terminating
** (stop) exited in: GenServer.call(#PID<0.577.0>, {:checkout, #Reference<0.0.5.2491>, true, 15000}, 5000)
    ** (EXIT) shutdown: "owner #PID<0.576.0> exited while client #PID<0.579.0> is still running with: shutdown"
    (db_connection) lib/db_connection/ownership/proxy.ex:32: DBConnection.Ownership.Proxy.checkout/2
    (db_connection) lib/db_connection.ex:919: DBConnection.checkout/2
    (db_connection) lib/db_connection.ex:741: DBConnection.run/3
    (db_connection) lib/db_connection.ex:584: DBConnection.prepare_execute/4
    (ecto) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.prepare_execute/5
    (ecto) lib/ecto/adapters/sql.ex:243: Ecto.Adapters.SQL.sql_call/6
    (ecto) lib/ecto/adapters/sql.ex:431: Ecto.Adapters.SQL.execute_and_cache/7
    (ecto) lib/ecto/repo/queryable.ex:130: Ecto.Repo.Queryable.execute/5
    (ecto) lib/ecto/repo/queryable.ex:35: Ecto.Repo.Queryable.all/4
    (my_app) lib/my_app/presence.ex:17: MyApp.Presence.fetch/2
    (phoenix) lib/phoenix/presence.ex:199: anonymous fn/5 in Phoenix.Presence.handle_diff/5
    (stdlib) lists.erl:1263: :lists.foldl/3
    (phoenix) lib/phoenix/presence.ex:197: anonymous fn/4 in Phoenix.Presence.handle_diff/5
    (elixir) lib/task/supervised.ex:85: Task.Supervised.do_apply/2
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Function: #Function<1.96140178/0 in Phoenix.Presence.handle_diff/5>
    Args: []

Using this channel module:

defmodule MyApp.ExampleChannel do
  use MyApp.Web, :channel

  def join(_channel_name, _params, socket) do
    send self(), :after_join

    {:ok, socket}
  end

  def handle_info(:after_join, socket) do
    current_user = socket.assigns.current_user

    {:ok, _} = Presence.track(socket, current_user.id, %{})

    {:noreply, socket}
  end
end

this presence module:

defmodule MyApp.Presence do
  import Ecto.Query
  alias MyApp.User
  alias MyApp.Repo

  use Phoenix.Presence, otp_app: :MyApp,
                        pubsub_server: MyApp.PubSub

  def fetch(_topic, entries) do
    query =
      from u in User,
        where: u.id in ^Map.keys(entries),
        select: {u.id, u}

    users = query |> Repo.all |> Enum.into(%{})

    for { key, %{ metas: metas } } <- entries, into: %{} do
      int_key = String.to_integer(key)
      { key, %{ metas: metas, user: users[int_key] } }
    end
  end
end

and this channel test:

defmodule MyApp.ExampleChannelTest do
  use MyApp.ChannelCase

  alias MyApp.ExampleChannel

  test 'example' do
    user = create_a_user

    params = %{ current_user: user }
    {:ok, _, socket} = subscribe_and_join(socket("", params), ExampleChannel, "control")
  end
end

Any help is greatly appreciated. Thanks!

luizpvasc · September 25, 2017, 2:23am

Hi there Did you find a solution to this? Has anyone also had this problem?

(Sorry to bring back an old post, but it describes exactly the same issue I’m having.)

luizpvasc · September 25, 2017, 3:22pm

As a temporary solution, adding this code at the end of the channel test

ref = leave(socket)
assert_reply ref, :ok
IO.puts "socket leave"
:timer.sleep(200)

makes the fetch function in Presence be called before the test finishes, which works with the Ecto’s sandbox shared strategy. If I remove the :timer.sleep call, the error continues.

** (stop) exited in: GenServer.call(#PID<0.384.0>, {:checkout, 
   #Reference<0.1150698256.1547436035.131411>, true, 15000}, 5000)
** (EXIT) shutdown: "owner #PID<0.383.0> exited with: shutdown"

nickewing · September 26, 2017, 9:50am

I never actually found a solution for this, but with our app we ended up removing the fetch function entirely. Now we just return user IDs and match them up with user data sent via other channels.

sorentwo · September 27, 2017, 4:45am

This is the correct solution in my experience. You must leave before the test is complete while the sandbox connection is still checked out—it also has to be synchronous to enable automatic sharing, as there isn’t any way to allow the presence process access to the test’s db connection.

The sleep doesn’t have to be nearly so long as 200ms though. The helper function I have uses a 10ms delay and doesn’t flicker on low powered systems like CI.

luizpvasc · September 27, 2017, 1:35pm

That’s great to hear, thanks. Just for fun, sleeping for 4ms ~ 5ms works about 50% of the time on my machine.

I’ll try to submit a pull request for the Phoenix Presence docs with the info from this thread

rhcarvalho · May 27, 2025, 7:48am

Reviving the thread because a) there’s been more recent advancements that can help anyone landing here and b) I still have trouble making tests involving Phoenix.Presence deterministic.

In my particular case, my fetcher doesn’t use the DB but calls an external API, which I’m trying to mock with TestServer (context TestServer - No fuzz mocking of third-party services - #6 by rhcarvalho), but I believe the underlying trouble is the same.

First, summarizing the knowledge from the thread, in 2017 @luizpvasc and @sorentwo suggested something like:

ref = leave(socket)
assert_reply ref, :ok
:timer.sleep(10)

In late 2019, 2020, this GitHub issue brings more light into the problem:

github.com/phoenixframework/phoenix

Tests for the Rumbl application end with error

opened 02:37PM - 23 Nov 19 UTC

closed 05:27PM - 10 Jun 20 UTC

stefanchrobot

### Environment * Elixir version (elixir -v): ``` Erlang/OTP 22 [erts-10.4.…4] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace] Elixir 1.9.0 (compiled with Erlang/OTP 22) ``` * Phoenix version (mix deps): `phoenix 1.4.11 (Hex package) (mix)` * NodeJS version (node -v): `0.16.2` * NPM version (npm -v): `6.9.0` * Operating system: macOS Mojave ### Actual behavior Running `mix test` on the Rumbl application from Programming Phoenix 1.4 ends up with an error: ``` 15:18:54.721 [error] Postgrex.Protocol (#PID<0.291.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.578.0> exited Client #PID<0.581.0> is still using a connection from owner at location: :prim_inet.recv0/3 (postgrex) lib/postgrex/protocol.ex:2837: Postgrex.Protocol.msg_recv/4 (postgrex) lib/postgrex/protocol.ex:2553: Postgrex.Protocol.recv_transaction/4 (postgrex) lib/postgrex/protocol.ex:1858: Postgrex.Protocol.rebind_execute/4 (ecto_sql) lib/ecto/adapters/sql/sandbox.ex:370: Ecto.Adapters.SQL.Sandbox.Connection.proxy/3 (db_connection) lib/db_connection/holder.ex:293: DBConnection.Holder.holder_apply/4 (db_connection) lib/db_connection.ex:1255: DBConnection.run_execute/5 (db_connection) lib/db_connection.ex:1342: DBConnection.run/6 (db_connection) lib/db_connection.ex:596: DBConnection.execute/4 (ecto_sql) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.execute/4 (ecto_sql) lib/ecto/adapters/sql.ex:580: Ecto.Adapters.SQL.execute!/4 (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5 (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4 (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3 (rumbl_web) lib/rumbl_web/channels/presence.ex:78: RumblWeb.Presence.fetch/2 (phoenix) lib/phoenix/presence.ex:318: anonymous fn/5 in Phoenix.Presence.handle_diff/5 (stdlib) maps.erl:232: :maps.fold_1/3 (phoenix) lib/phoenix/presence.ex:316: anonymous fn/4 in Phoenix.Presence.handle_diff/5 (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 The connection itself was checked out by #PID<0.581.0> at location: (ecto_sql) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.execute/4 (ecto_sql) lib/ecto/adapters/sql.ex:580: Ecto.Adapters.SQL.execute!/4 (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5 (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4 (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3 (rumbl_web) lib/rumbl_web/channels/presence.ex:78: RumblWeb.Presence.fetch/2 (phoenix) lib/phoenix/presence.ex:318: anonymous fn/5 in Phoenix.Presence.handle_diff/5 (stdlib) maps.erl:232: :maps.fold_1/3 (phoenix) lib/phoenix/presence.ex:316: anonymous fn/4 in Phoenix.Presence.handle_diff/5 (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 ``` Not sure if this was reported, but I still hit this bug with Phoenix 1.4.11. In my setup it happens every time. So I'm opening this as noted by @josevalim: > Good catch! Basically the Presence wants to query the DB and send updates but the test has shut down and there are no database connections. To fix this consistently, we would need to query the presence supervisor and ask all of its children to shutdown at the end of each test. But to do so, we will need to add a new API to Phoenix. If you can open up an issue in Phoenix issues tracker, it would be extra helpful. Thank you! ### Reproduction Download and extract https://pragprog.com/titles/phoenix14/source_code. The relevant app is in `code/testing_otp/rumbl_umbrella`. Strangely enough I can't seem to even run the tests (the error report above is from my follow-along version): ``` ** (Mix) Could not start application rumbl_web: RumblWeb.Application.start(:normal, []) returned an error: shutdown: failed to start child: RumblWeb.Presence ** (EXIT) shutdown: failed to start child: Phoenix.Tracker ** (EXIT) shutdown: failed to start child: RumblWeb.Presence_shard0 ** (EXIT) an exception was raised: ** (ArgumentError) argument error (stdlib) :ets.lookup(RumblWeb.PubSub, :node_name) (phoenix_pubsub) lib/phoenix/pubsub.ex:288: Phoenix.PubSub.call/3 (rumbl_web) lib/rumbl_web/channels/presence.ex:10: RumblWeb.Presence.init/1 (phoenix_pubsub) lib/phoenix/tracker/shard.ex:120: Phoenix.Tracker.Shard.init/1 (stdlib) gen_server.erl:374: :gen_server.init_it/2 (stdlib) gen_server.erl:342: :gen_server.init_it/6 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 ``` ### Expected behavior The tests should pass without error.

Quoting José Valim:

The issue is that once the test terminates, the channel process will terminate, the presence process will notice the channel termination, and then invoke the callbacks without the database.

All of this happens async, so it is hard to make it sync. I am not sure at the moment how to fix those.

The conversation goes on and a new API has been added along with some docs:

In 2021, @ruslandoga notices something I’ve experienced as well in practice, that the new Presence.fetchers_pids() might pick up an empty list and so adds some sleep time before calling it:

github.com/phoenixframework/phoenix

Tests for the Rumbl application end with error

opened 02:37PM - 23 Nov 19 UTC

closed 05:27PM - 10 Jun 20 UTC

stefanchrobot

### Environment * Elixir version (elixir -v): ``` Erlang/OTP 22 [erts-10.4.…4] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [dtrace] Elixir 1.9.0 (compiled with Erlang/OTP 22) ``` * Phoenix version (mix deps): `phoenix 1.4.11 (Hex package) (mix)` * NodeJS version (node -v): `0.16.2` * NPM version (npm -v): `6.9.0` * Operating system: macOS Mojave ### Actual behavior Running `mix test` on the Rumbl application from Programming Phoenix 1.4 ends up with an error: ``` 15:18:54.721 [error] Postgrex.Protocol (#PID<0.291.0>) disconnected: ** (DBConnection.ConnectionError) owner #PID<0.578.0> exited Client #PID<0.581.0> is still using a connection from owner at location: :prim_inet.recv0/3 (postgrex) lib/postgrex/protocol.ex:2837: Postgrex.Protocol.msg_recv/4 (postgrex) lib/postgrex/protocol.ex:2553: Postgrex.Protocol.recv_transaction/4 (postgrex) lib/postgrex/protocol.ex:1858: Postgrex.Protocol.rebind_execute/4 (ecto_sql) lib/ecto/adapters/sql/sandbox.ex:370: Ecto.Adapters.SQL.Sandbox.Connection.proxy/3 (db_connection) lib/db_connection/holder.ex:293: DBConnection.Holder.holder_apply/4 (db_connection) lib/db_connection.ex:1255: DBConnection.run_execute/5 (db_connection) lib/db_connection.ex:1342: DBConnection.run/6 (db_connection) lib/db_connection.ex:596: DBConnection.execute/4 (ecto_sql) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.execute/4 (ecto_sql) lib/ecto/adapters/sql.ex:580: Ecto.Adapters.SQL.execute!/4 (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5 (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4 (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3 (rumbl_web) lib/rumbl_web/channels/presence.ex:78: RumblWeb.Presence.fetch/2 (phoenix) lib/phoenix/presence.ex:318: anonymous fn/5 in Phoenix.Presence.handle_diff/5 (stdlib) maps.erl:232: :maps.fold_1/3 (phoenix) lib/phoenix/presence.ex:316: anonymous fn/4 in Phoenix.Presence.handle_diff/5 (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 The connection itself was checked out by #PID<0.581.0> at location: (ecto_sql) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.execute/4 (ecto_sql) lib/ecto/adapters/sql.ex:580: Ecto.Adapters.SQL.execute!/4 (ecto_sql) lib/ecto/adapters/sql.ex:562: Ecto.Adapters.SQL.execute/5 (ecto) lib/ecto/repo/queryable.ex:177: Ecto.Repo.Queryable.execute/4 (ecto) lib/ecto/repo/queryable.ex:17: Ecto.Repo.Queryable.all/3 (rumbl_web) lib/rumbl_web/channels/presence.ex:78: RumblWeb.Presence.fetch/2 (phoenix) lib/phoenix/presence.ex:318: anonymous fn/5 in Phoenix.Presence.handle_diff/5 (stdlib) maps.erl:232: :maps.fold_1/3 (phoenix) lib/phoenix/presence.ex:316: anonymous fn/4 in Phoenix.Presence.handle_diff/5 (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 ``` Not sure if this was reported, but I still hit this bug with Phoenix 1.4.11. In my setup it happens every time. So I'm opening this as noted by @josevalim: > Good catch! Basically the Presence wants to query the DB and send updates but the test has shut down and there are no database connections. To fix this consistently, we would need to query the presence supervisor and ask all of its children to shutdown at the end of each test. But to do so, we will need to add a new API to Phoenix. If you can open up an issue in Phoenix issues tracker, it would be extra helpful. Thank you! ### Reproduction Download and extract https://pragprog.com/titles/phoenix14/source_code. The relevant app is in `code/testing_otp/rumbl_umbrella`. Strangely enough I can't seem to even run the tests (the error report above is from my follow-along version): ``` ** (Mix) Could not start application rumbl_web: RumblWeb.Application.start(:normal, []) returned an error: shutdown: failed to start child: RumblWeb.Presence ** (EXIT) shutdown: failed to start child: Phoenix.Tracker ** (EXIT) shutdown: failed to start child: RumblWeb.Presence_shard0 ** (EXIT) an exception was raised: ** (ArgumentError) argument error (stdlib) :ets.lookup(RumblWeb.PubSub, :node_name) (phoenix_pubsub) lib/phoenix/pubsub.ex:288: Phoenix.PubSub.call/3 (rumbl_web) lib/rumbl_web/channels/presence.ex:10: RumblWeb.Presence.init/1 (phoenix_pubsub) lib/phoenix/tracker/shard.ex:120: Phoenix.Tracker.Shard.init/1 (stdlib) gen_server.erl:374: :gen_server.init_it/2 (stdlib) gen_server.erl:342: :gen_server.init_it/6 (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 ``` ### Expected behavior The tests should pass without error.

on_exit(fn ->
    :timer.sleep(10)  #### WAIT FOR FETCHER PROCESSES TO BE STARTED ####
    for pid <- RumblWeb.Presence.fetchers_pids()  do
      ref = Process.monitor(pid)
      assert_receive {:DOWN, ^ref, _, _, _}, 1000
    end
  end)

I used GitHub Search to see what people are doing in the open: Code search results · GitHub

The first hit for me is LiveBeat which does use a 100ms sleep:

github.com/fly-apps/live_beats

test/support/conn_case.ex

ac9780472


      
          defp wait_for_children(children_lookup) when is_function(children_lookup) do
            Process.sleep(100)
          
            for pid <- children_lookup.() do
              ref = Process.monitor(pid)
              assert_receive {:DOWN, ^ref, _, _, _}, 1000
            end
          end
          
          setup tags do
            pid = Ecto.Adapters.SQL.Sandbox.start_owner!(LiveBeats.Repo, shared: not tags[:async])
            on_exit(fn -> Ecto.Adapters.SQL.Sandbox.stop_owner(pid) end)
          
            on_exit(fn ->
              wait_for_children(fn -> LiveBeatsWeb.Presence.fetchers_pids() end)
            end)

Second hit is NervesHub that just follows the documentation and has no sleep:

github.com/nerves-hub/nerves_hub_web

test/nerves_hub_web/live/new_ui/devices/show_test.exs

2a8cd770e


      
          on_exit(fn ->
            for pid <- NervesHubWeb.Presence.fetchers_pids() do
              ref = Process.monitor(pid)
              assert_receive {:DOWN, ^ref, _, _, _}, 1000
            end
          end)

Going down the list I find both cases of with and without sleep, with different amounts of sleep. And found this commit message from @gpreston which reinforces people don’t know what to do I don’t know what to do… the sleep time still feels non-deterministic, racy.

TODO:

I’m tempted to suggest updating the docs with the sleep before calling fetchers_pid
Discuss what else can we do. Could tests reasonable synchronize with Presence fetchers? Can we write tests knowing that a certain fetcher will be called exactly N number of times? Is that a bad idea to begin with?

Would love to learn more about this corner of Elixir. There are so many pieces involved in making Presence work that understanding how everything fits together (and points of synchronization) is no easy feat

rhcarvalho · May 27, 2025, 8:22am

After applying what LiveBeats is doing, sleep + fetchers_pids() + wait for termination, and running tests with --repeat-until-failure, I get a feeling that it works consistently in the number of fetcher calls I’m observing.

(Also starting the BEAM multiple times with a shell loop in case that would make a difference)

for i in {1..1000}; do
  echo "Run $i"
  mix test --repeat-until-failure 1000 --max-failures 1 || break
done

If I comment out the Process.sleep(100) and run the same as above, I get the same behavior.

So I don’t know if the sleep is actually helpful. I had an old comment in my code base documenting that previous attempts at waiting for fetchers_pids() without the sleep didn’t work consistently.

FYI I’m on Elixir 1.18.4, Phoenix 1.7.21.