Handling Phoenix Presence during testing

I have run into an issue while testing a simple Phoenix channel and using Phoenix.Presence.

It looks to me like the tests completes and the Presence process notices that a user has disconnected as a result. It then attempts to fetch the new user list but at that point the database connection has already been closed for that test. Is there any way to handle this sort of situation?

I tried calling Presence.unlink from within the test but was unable to get any other results.

I’m seeing the following error:

09:32:22.414 [error] Task #PID<0.580.0> started from MyApp.Presence terminating
** (stop) exited in: GenServer.call(#PID<0.577.0>, {:checkout, #Reference<0.0.5.2491>, true, 15000}, 5000)
    ** (EXIT) shutdown: "owner #PID<0.576.0> exited while client #PID<0.579.0> is still running with: shutdown"
    (db_connection) lib/db_connection/ownership/proxy.ex:32: DBConnection.Ownership.Proxy.checkout/2
    (db_connection) lib/db_connection.ex:919: DBConnection.checkout/2
    (db_connection) lib/db_connection.ex:741: DBConnection.run/3
    (db_connection) lib/db_connection.ex:584: DBConnection.prepare_execute/4
    (ecto) lib/ecto/adapters/postgres/connection.ex:80: Ecto.Adapters.Postgres.Connection.prepare_execute/5
    (ecto) lib/ecto/adapters/sql.ex:243: Ecto.Adapters.SQL.sql_call/6
    (ecto) lib/ecto/adapters/sql.ex:431: Ecto.Adapters.SQL.execute_and_cache/7
    (ecto) lib/ecto/repo/queryable.ex:130: Ecto.Repo.Queryable.execute/5
    (ecto) lib/ecto/repo/queryable.ex:35: Ecto.Repo.Queryable.all/4
    (my_app) lib/my_app/presence.ex:17: MyApp.Presence.fetch/2
    (phoenix) lib/phoenix/presence.ex:199: anonymous fn/5 in Phoenix.Presence.handle_diff/5
    (stdlib) lists.erl:1263: :lists.foldl/3
    (phoenix) lib/phoenix/presence.ex:197: anonymous fn/4 in Phoenix.Presence.handle_diff/5
    (elixir) lib/task/supervised.ex:85: Task.Supervised.do_apply/2
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Function: #Function<1.96140178/0 in Phoenix.Presence.handle_diff/5>
    Args: []

Using this channel module:

defmodule MyApp.ExampleChannel do
  use MyApp.Web, :channel

  def join(_channel_name, _params, socket) do
    send self(), :after_join

    {:ok, socket}
  end

  def handle_info(:after_join, socket) do
    current_user = socket.assigns.current_user

    {:ok, _} = Presence.track(socket, current_user.id, %{})

    {:noreply, socket}
  end
end

this presence module:

defmodule MyApp.Presence do
  import Ecto.Query
  alias MyApp.User
  alias MyApp.Repo

  use Phoenix.Presence, otp_app: :MyApp,
                        pubsub_server: MyApp.PubSub

  def fetch(_topic, entries) do
    query =
      from u in User,
        where: u.id in ^Map.keys(entries),
        select: {u.id, u}

    users = query |> Repo.all |> Enum.into(%{})

    for { key, %{ metas: metas } } <- entries, into: %{} do
      int_key = String.to_integer(key)
      { key, %{ metas: metas, user: users[int_key] } }
    end
  end
end

and this channel test:

defmodule MyApp.ExampleChannelTest do
  use MyApp.ChannelCase

  alias MyApp.ExampleChannel

  test 'example' do
    user = create_a_user

    params = %{ current_user: user }
    {:ok, _, socket} = subscribe_and_join(socket("", params), ExampleChannel, "control")
  end
end

Any help is greatly appreciated. Thanks!

2 Likes

Hi there :slight_smile: Did you find a solution to this? Has anyone also had this problem?

(Sorry to bring back an old post, but it describes exactly the same issue I’m having.)

As a temporary solution, adding this code at the end of the channel test

ref = leave(socket)
assert_reply ref, :ok
IO.puts "socket leave"
:timer.sleep(200)

makes the fetch function in Presence be called before the test finishes, which works with the Ecto’s sandbox shared strategy. If I remove the :timer.sleep call, the error continues.

** (stop) exited in: GenServer.call(#PID<0.384.0>, {:checkout, 
   #Reference<0.1150698256.1547436035.131411>, true, 15000}, 5000)
** (EXIT) shutdown: "owner #PID<0.383.0> exited with: shutdown"

I never actually found a solution for this, but with our app we ended up removing the fetch function entirely. Now we just return user IDs and match them up with user data sent via other channels.

1 Like

This is the correct solution in my experience. You must leave before the test is complete while the sandbox connection is still checked out—it also has to be synchronous to enable automatic sharing, as there isn’t any way to allow the presence process access to the test’s db connection.

The sleep doesn’t have to be nearly so long as 200ms though. The helper function I have uses a 10ms delay and doesn’t flicker on low powered systems like CI.

1 Like

That’s great to hear, thanks. Just for fun, sleeping for 4ms ~ 5ms works about 50% of the time on my machine.

I’ll try to submit a pull request for the Phoenix Presence docs with the info from this thread :slight_smile:

1 Like

Reviving the thread because a) there’s been more recent advancements that can help anyone landing here and b) I still have trouble making tests involving Phoenix.Presence deterministic.

In my particular case, my fetcher doesn’t use the DB but calls an external API, which I’m trying to mock with TestServer (context TestServer - No fuzz mocking of third-party services - #6 by rhcarvalho), but I believe the underlying trouble is the same.


First, summarizing the knowledge from the thread, in 2017 @luizpvasc and @sorentwo suggested something like:

ref = leave(socket)
assert_reply ref, :ok
:timer.sleep(10)

In late 2019, 2020, this GitHub issue brings more light into the problem:

Quoting José Valim:

The issue is that once the test terminates, the channel process will terminate, the presence process will notice the channel termination, and then invoke the callbacks without the database.

All of this happens async, so it is hard to make it sync. I am not sure at the moment how to fix those.

The conversation goes on and a new API has been added along with some docs:

In 2021, @ruslandoga notices something I’ve experienced as well in practice, that the new Presence.fetchers_pids() might pick up an empty list and so adds some sleep time before calling it:

on_exit(fn ->
    :timer.sleep(10)  #### WAIT FOR FETCHER PROCESSES TO BE STARTED ####
    for pid <- RumblWeb.Presence.fetchers_pids()  do
      ref = Process.monitor(pid)
      assert_receive {:DOWN, ^ref, _, _, _}, 1000
    end
  end)

I used GitHub Search to see what people are doing in the open: Code search results · GitHub

The first hit for me is LiveBeat which does use a 100ms sleep:

Second hit is NervesHub that just follows the documentation and has no sleep:

Going down the list I find both cases of with and without sleep, with different amounts of sleep. And found this commit message from @gpreston which reinforces people don’t know what to do :slight_smile: I don’t know what to do… the sleep time still feels non-deterministic, racy.


TODO:

  • I’m tempted to suggest updating the docs with the sleep before calling fetchers_pid
  • Discuss what else can we do. Could tests reasonable synchronize with Presence fetchers? Can we write tests knowing that a certain fetcher will be called exactly N number of times? Is that a bad idea to begin with?

Would love to learn more about this corner of Elixir. There are so many pieces involved in making Presence work that understanding how everything fits together (and points of synchronization) is no easy feat :slight_smile:

1 Like

After applying what LiveBeats is doing, sleep + fetchers_pids() + wait for termination, and running tests with --repeat-until-failure, I get a feeling that it works consistently in the number of fetcher calls I’m observing.

(Also starting the BEAM multiple times with a shell loop in case that would make a difference)

for i in {1..1000}; do
  echo "Run $i"
  mix test --repeat-until-failure 1000 --max-failures 1 || break
done

If I comment out the Process.sleep(100) and run the same as above, I get the same behavior.

So I don’t know if the sleep is actually helpful. I had an old comment in my code base documenting that previous attempts at waiting for fetchers_pids() without the sleep didn’t work consistently.

FYI I’m on Elixir 1.18.4, Phoenix 1.7.21.