Strange DBConnection.ConnectionError with Ecto 3.0.6

@dimitarvp Yes, the Calculations.maybe_publish_a_group_change/2 and Calculations.maybe_publish_b_group_change/2 functions are determining whether a relevant change has occurred, and, if so, they are doing a db query to get a new calculation total and then publishing a graphql subscription update.

Let’s try just waiting for the tasks to finish as a start.

Don’t ignore their pids, collect them in e.g. the pid1 and pid2 variables and then do Task.await_many([pid1, pid2]) after the second task start, and let us know if that fixed it. Also remove the Process.sleep; don’t ever fix timing errors like that.

@dimitarvp Thanks for the suggestions! I had initially been waiting for the tasks to finish, using await_many as you suggested. But I don’t actually need the results, and I didn’t want to get an error in the case of a DB timeout, so I thought I could just fire off the tasks and forget about them. Doing it the initial way with Task.await_many didn’t have the same issues in the tests.

I’m actually working on a more robust fix/refactor now that will probably involve going back to awaiting the tasks as you’ve suggested.

Good to have the confirmation about Process.sleep being bad practice. Thanks!

And thanks again for taking the time to offer advice. I appreciate it!

The error you’re seeing is an integral part of the db_connection package, which several DB packages step on (Postgres for Ecto included). I am not aware of a way to disable it though I haven’t looked at the source in detail either, I admit.

But, why being conservative about waiting for the tasks to finish? If Process.sleep(50) fixed the problem then I’d imagine that Task.await_many will finish in 50ms or less, making it an imperceptible lag.

To expand on this: this is a bandaid, and is done when you just give up and are like “no clue what needs this little extra timie to finish but I am not going to bother looking for it”.

Don’t do that. Be a good engineer, your future self and colleagues will thank you for it.

And yes I know it’s tests and people somehow drop (part of) their coding standards there, but IMO tests need love too!

2 Likes

Maybe it’s just that I’ve been reading too many Jepsen reports this week, but I wonder about the “we don’t care about the results” part of this use of Task. Imagine this scenario:

  • one request kicks off an update Task
  • second request kicks of an update Task
  • first task does its DB query
  • second task does its DB query
  • second task sends its notifications
  • first task sends its notifications <==== OH NO

The core of any solution: identify the tasks based on group, and ensure that only one runs at a time for a particular group.

That might mean a GenServer juggling a fleet of tasks, or it might mean some careful Oban config. Your tests can then check in on a_group and b_group if needed, or just issue a blanket “nevermind about those, kthxbai” command.

1 Like

Yeah, the tasks are very fast in the vast majority of cases. This was a temporary stop-gap for figuring out a better way to handle the rare instance when it takes a long time.

I hear your points on not using bandaids in tests! It’s never my intention to drop my coding standards in my tests.

Thanks again for the feedback :slight_smile:

Thanks for the feedback! In this particular case, that last bullet point scenario wouldn’t be too detrimental, but of course not ideal. I appreciate you pointing out the race condition, which I hadn’t considered before, and the advice about how to address it properly!

And thanks for the link to the Jepsen reports! I hadn’t heard of them before and am looking forward to taking a look.

I’ve read the comments and learnt a few things as well, especially on Process.sleep/1 and I assume :time.sleep/1 being a bandaid.
I learnt a few solutions, thanks to @almirsarajcic and @dimitarvp

  1. You could send a message to the parent (test) process from the async process. In my case I needed to mock the async code. This is what I had initially:
test "blah blah blah" do
    test_pid = self()
    ref = make_ref()

    expect(MyMock, :some_fn, fn -> 
        send(test_pid, {:done, ref})  
    end)

    # Just before `end` of the `test` block, I asserted that I received the message from the async process:
    assert_receive {:done, ^ref}
     
    # Here, you can assert on any side-effects of the async code, like DB changes, etc.
end

This way, we are guaranteed that the async process returns before the test exits.

  1. You can (and should always) use supervised tasks, which gives you the flexibility to start your Task Supervisor in the tests that need it. Here was my final setup:
    In MyApp.Application:
def start(type, args) do
    children = [MyAppWeb.Endpoint, # other children] ++ more_children()
      end

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

  defp more_children(env \\ Application.get_env(:my_app, :env))
  defp more_children(:test), do: []
  defp more_children(_env), do: [{Task.Supervisor, name: MyApp.TaskSupervisor}]

Then, in the function I am testing, I use a supervised Task:

def some_function do
  Task.Supervisor.async_nolink(MyApp.TaskSupervisor, fn -> 
    # do some work
  end)
  
  :ok
end

Finally, in my test, I start the task supervisor as part of my test setup:

describe "blah" do
  setup do
      start_supervised!({Task.Supervisor, name: MyApp.TaskSupervisor, strategy: :one_for_one})

      :ok
  end
  test "blah blah" do
      expect(MyMock, :some_function, fn -> 
         :ok
      end)
  end
end

I loved the second approach and it’s what I use, but I’m curious though. If I want to assert on DB change made by the async process, how else would I do that without adding a delay? I would still get the DB connection errors with this approach.

2 Likes

I don’t know if the people above are getting this error for the same reason I got it - my issue was that I was using Process.sleep(5000) inside a module and while running mix test I was getting this error Client #PID<0.1389.0> is still using a connection from owner at location. Removing the Process.sleep(5000) resolved my issue.

You should never use sleep anyway, except when you actually need it – and for synchronization it’s not needed 99.9% of the time.

I only used it for testing purpose an not in production. You should never use sleep anyway, except when you actually need it - yeah I agree with you

Who knows how I can walk-around this issue? @dimitarvp? anyone?

Hm, you have a good post above, thought you solved your issue? Also asserting on a DB change should be the same: send a message back (if the DB change is happening in a separate process that is) – or just wait for the async Yask to finish.

1 Like

Found this helpful article from Dockyard. So, I just call flush_task_supervisor() before my DB assertions, and it works!

1 Like