Confused: Why does FunWithFlags not have timing problems with `async: true` tests?

patmaddox · October 20, 2021, 9:29pm

I am using FunWithFlags (fun_with_flags) in an application, and have created a scenario that I think should cause tests to interfere with one another, but they don’t. I am very confused as to why, and hoping that someone can shed some light on it.

The app uses the Ecto persistence adapter, with a Postgres database.

I’ve created a public repo example app demonstrating this for anyone interested in looking at it.

Here is the test:

defmodule FwfAsyncTest do
  defmodule FWFAsync_1 do
    use FwfAsync.TestCase, async: true

    test "enables flag" do
      name = "enables flag"
      :timer.sleep(1000)
      IO.puts("(#{name} - #{now()}) enabling foo")
      {:ok, true} = FunWithFlags.enable(:foo)
      IO.puts("(#{name} - #{now()}) enabling foo - done")

      :timer.sleep(3000)
      IO.puts("(#{name} - #{now()}) assert FunWithFlags.enabled?(:foo)")
      assert FunWithFlags.enabled?(:foo)
    end
  end

  defmodule FWFAsync_2 do
    use FwfAsync.TestCase, async: true

    test "disables flag" do
      name = "disables flag"
      IO.puts("(#{name} - #{now()}) disabling foo")
      {:ok, false} = FunWithFlags.disable(:foo)
      IO.puts("(#{name} - #{now()}) disabling foo - done")
      :timer.sleep(2000)
      IO.puts("(#{name} - #{now()}) refute FunWithFlags.enabled?(:foo)")
      refute FunWithFlags.enabled?(:foo)

      :timer.sleep(3000)
    end
  end
end

The tests pass, and the output is:

(disables flag - 1634762854) disabling foo
(enables flag - 1634762855) enabling foo
(disables flag - 1634762856) refute FunWithFlags.enabled?(:foo)
.(enables flag - 1634762862) assert FunWithFlags.enabled?(:foo)
.

Finished in 8.1 seconds
2 tests, 0 failures

Two things are curious about this to me:

The tests pass. I would expect the “refute” test to fail, because it occurs after the flag has been enabled.
The time between the refute and assert is 6 seconds. I would expect it to only be 2 seconds.

I have looked through the FunWithFlags code, and done lots of inspecting in my code, and have found that the FunWithFlags PID, Cache PID, and underlying ETS table reference are the same in both tests.

Clearly there is something I’m misunderstanding about ExUnit and/or OTP. Or perhaps I just set up my async code incorrectly

Can anyone help me understand:

Why do the tests pass?
Why is the time between the final two outputs 6 seconds rather than 2 seconds?

hauptbenutzer · October 20, 2021, 10:03pm

Hi! Without having looked into the details of FunWithFlags in this configuration: As a sanity check I would insert some more timing logs after the FunWithFlags calls just to make sure there is no blocking/locking going on.

patmaddox · October 20, 2021, 11:46pm

I think you’re on to something… I added puts statements immediately after the enable/disable calls and this is what happens:

(disables flag - 1634771363) disabling foo
(disables flag - 1634771363) disabling foo - done
(enables flag - 1634771364) enabling foo
(disables flag - 1634771365) refute FunWithFlags.enabled?(:foo)
.(enables flag - 1634771368) enabling foo - done
(enables flag - 1634771371) assert FunWithFlags.enabled?(:foo)

Though I have no idea what would be causing it.

I’ve created a public repo example app demonstrating this for anyone interested in looking at it.

With SQL logs, this is what we see:

% mix test test/fwf_async_test.exs
Compiling 4 files (.ex)
Generated fwf_async app
(disables flag - 1634773517) disabling foo

16:45:17.306 [debug] QUERY OK db=0.4ms decode=0.6ms queue=1.0ms
INSERT INTO "fun_with_flags_toggles" AS f0 ("enabled","flag_name","gate_type","target") VALUES ($1,$2,$3,$4) ON CONFLICT ("flag_name","gate_type","target") DO UPDATE SET "enabled" = $5 RETURNING "id" [false, "foo", "boolean", "_fwf_none", false]

16:45:17.311 [debug] QUERY OK source="fun_with_flags_toggles" db=0.2ms queue=0.3ms
SELECT f0."id", f0."flag_name", f0."gate_type", f0."target", f0."enabled" FROM "fun_with_flags_toggles" AS f0 WHERE (f0."flag_name" = $1) ["foo"]
(disables flag - 1634773517) disabling foo - done
(enables flag - 1634773518) enabling foo
(disables flag - 1634773519) refute FunWithFlags.enabled?(:foo)
.
16:45:22.317 [debug] QUERY OK db=4084.2ms queue=1.2ms
INSERT INTO "fun_with_flags_toggles" AS f0 ("enabled","flag_name","gate_type","target") VALUES ($1,$2,$3,$4) ON CONFLICT ("flag_name","gate_type","target") DO UPDATE SET "enabled" = $5 RETURNING "id" [true, "foo", "boolean", "_fwf_none", true]

16:45:22.319 [debug] QUERY OK source="fun_with_flags_toggles" db=1.2ms
SELECT f0."id", f0."flag_name", f0."gate_type", f0."target", f0."enabled" FROM "fun_with_flags_toggles" AS f0 WHERE (f0."flag_name" = $1) ["foo"]
(enables flag - 1634773522) enabling foo - done
(enables flag - 1634773525) assert FunWithFlags.enabled?(:foo)
.

Finished in 8.1 seconds
2 tests, 0 failures

ityonemo · October 21, 2021, 12:37am

Oh, it’s because you have

Ecto.Adapters.SQL.Sandbox.mode(FwfAsync.Repo, :manual) in your test_helpers.exs, and this block in your ConnCase.

  setup tags do
    :ok = Ecto.Adapters.SQL.Sandbox.checkout(FwfAsync.Repo)

    unless tags[:async] do
      Ecto.Adapters.SQL.Sandbox.mode(FwfAsync.Repo, {:shared, self()})
    end

    :ok
  end

This is the magic of elixir; each test gets its own DB transaction; so it’s a “sandboxed view of the database”. All DB functions within the test are mapped to the transaction. At the end of the test the transaction is sent to the garbage bin and no changes are committed; that’s why your test database remains empty.

This is all done using a key-value lookup table that is tied to the lifetime of the process (each test is one process in ExUnit). so the Sandbox.checkout function creates a DB transaction and registers it under the self() process-id as key; then whenever you call a database function with the Sandbox-enabled Repo, it can map self() to the correct DB transaction resource. I haven’t looked at ecto sandbox code, but Probably there is also a process monitor that waits for the test to end (and its process stopped) that then triggers freeing the DB transaction.

There’s also magic that lets you use Task successfully without having to keep track of these pids, but that’s another subject for another time.

patmaddox · October 21, 2021, 1:15am

This would answer the question if it were only Ecto involved, so the two tests run in separate DB transactions.

My confusion stems from the fact that there’s also an ets table used for caching, and that enabling / disabling a flag uses a write-through cache.

I pushed a change that interacts with the Cache to demonstrate that it’s reading from the ETS table cache rather than the database. So I think somehow it’s not just the database transaction that is isolated, but the Process / ETS table / something that is beyond my understanding.

I have one other theory here, which is that it’s a result of the unique index on FWF table. So what would happen is:

Test 1 begins a transaction, tries to write to the table, setting the feature flag name.
Test 2 begins a transaction, tries to write to the table, setting the same feature flag name - and blocks because the first transaction hasn’t completed??

For this write-through cache:

  def put(flag_name, gate) do
    flag_name
    |> persistence_adapter().put(gate)
    |> publish_change()
    |> cache_persistence_result()
  end

the cache_persistence_result doesn’t execute until the persistence_adapter().put(gate) completes, and test 2’s transaction waits for test 1’s transaction to roll back or commit.

Perhaps @tompave is around somewhere

patmaddox · October 21, 2021, 1:39am

Looks like that’s exactly what happens. Super interesting!

fwf-psql

josevalim · October 21, 2021, 5:08am

Correct, that’s what is happening with the aid of the sandbox. That’s also why it is important to use random names, emails, etc for all unique index in your tests. If all of your tests use the same username, for example, then they will all block each other and you won’t actually run them concurrently.

tompave · October 21, 2021, 8:28am

I’m glad that you figured it out and that everything seems to work as intended!

On the second transaction blocking on INSERT, that’s by design. It would be incorrect to update the cache until it’s certain that the new gate value has been properly persisted. It looks like a huge wait time in your tests because of the sleeps, but in production code it shouldn’t be as much of an issue, and that type of race condition should be fairly rare. I added that kind of safeguard mainly to handle safely network retries and people using the web UI at the same time.

patmaddox · October 21, 2021, 11:32pm

Yeah that makes perfect sense.

This whole line of questioning began with me wondering what happens when you have concurrent tests interacting with processes. I just happened to experiment with FWF since we’re using it, and it seemed like an obvious place where I could demonstrate a concurrency issue. Turns out it wasn’t obvious, because Postgres is operating as a shared lock in this case. In other words, if FWF didn’t use Postgres and was just an in-memory cache, then it absolutely would demonstrate the timing issues I anticipated… I think.

Interestingly, it’s almost more of a problem in tests than it would be in production! We’re super unlikely to get concurrent write access to the feature flags in production, whereas it happens all the time in tests.

Consider the case where you have setup :configure_feature_flags available to your tests - all of a sudden your async test suite becomes implicitly synchronous because of the Postgres lock!

This isn’t a criticism of FWF in any way. This is me, a relative newbie to concurrent testing, coming to grips with the design and testing implications of concurrent systems.

Cool, and so now the question is, how do you handle situations where you can’t use random values to avoid blocks? This feature flag scenario is a perfect example.

Some ideas that come to mind:

mock the call to FunWithFlags.enabled?
have a purely functional layer, and pass boolean flags / gate lambdas into it from a layer that knows about FWF
designate a small number of tests as interacting with FWF and living with the fact that they’re not concurrent

josevalim · October 22, 2021, 6:27am

I would go with this. Especially because tests in the same module do not run concurrently anyway, so it serves as natural grouping. And you likely have dozens of other test files to run.

benwilson512 · October 22, 2021, 9:01am

Do you know of a way to detect whether this is happening or not across a test suite easily?

josevalim · October 22, 2021, 9:12am

I don’t know (i.e. I never looked for it).