Mongo connection times out, but only when it is used by a dependency

I’m experiencing some odd behavior. I’m using mongodb_driver (v0.9.0) in an app (app1), and it’s been working great. But suddenly, when I use that app as a dependency (by app2), suddenly all my mongo queries time out.

My guess is that I’m stepping into something tricky with supervisors or something. (Why am I doing this? My thought was that app1 defines all the schemas and contexts for how to access resources, and I didn’t want to duplicate any of that in app2 – I want app2 to be ignorant about the “implementation details”).

app1 wraps the Mongo start like this:

def MyMongoWrapper do
  def start_link(opts \\ []) do
    # Read some config here... user, password, etc.
    Mongo.start_link(opts)
  end
end

which is ref’d in app1’s application.ex:

def start(_type, _args) do
    children = [
      {MyMongoWrapper, []}
    ]
    opts = [strategy: :one_for_one, name: App1.Supervisor]
    Supervisor.start_link(children, opts)
  end

So all is well with the world in app1: I can start it up and make queries as expected.

But when I start working with app2 and I use app1 as a dependency, nothing seems to work, even though app1 DOES get started and the mongo process DOES get started.

Instead of relying on “implicit starting” (?) that happens when an app lists another app as a dependency, I tried manually listing App1 in app2’s supervision tree, but calls to Mongo still time out.

Likewise, if I remove all the children from app1’s application.ex and list the MyMongoWrapper in app2’s application.ex, I get the same result: any Mongo queries cause an exit due to timeout.

It’s really making me scratch my head and wonder why things are working in app1 at all. The config and everything seems to be exactly the same.

Does anyone have any ideas of what might be causing this type of behavior or other things I could check?

1 Like

I assume, your configuration is not working. Please turn on logging: GitHub - zookzook/elixir-mongodb-driver: MongoDB driver for Elixir, then you see more information about.

1 Like

Thanks for the prompt reply @zookzook! This is interesting. I have verified that BOTH app1 and app2 have the IDENTICAL configuration options, something like this:

[
  name: :my_mongo,
  pool_size: 20,
  url: "mongodb://:@localhost:/app1_dev",
  timeout: 15000,
  ssl: false
]

Note that the URL looks funny in dev because we don’t specify a user, password, port, or parameters (which do get set when connecting to a production mongo instance).

What is really curious is that BOTH app1 AND app2 show errors on startup when I set the config to include log: true:

10:13:33.630 [error] an exception was raised logging %DBConnection.LogEntry{call: :execute, connection_time: 5298000, decode_time: 6000, idle_time: 548338000, params: [], pool_time: 147000, query: %Mongo.Query{action: {:exec_hello, []}}, result: {:ok, %Mongo.Query{action: {:exec_hello, []}}, {%{"code" => 59, "codeName" => "CommandNotFound", "errmsg" => "no such command: 'helloOk'", "ok" => 0.0}, %Mongo.Events.CommandStartedEvent{command: :hello, command_name: :hello, connection_id: #PID<0.815.0>, database_name: "admin", operation_id: nil, request_id: 1}, 0, 5240}}}: ** (BadFunctionError) expected a function, got: true
    (db_connection 2.4.2) lib/db_connection.ex:1532: DBConnection.log/2
    (db_connection 2.4.2) lib/db_connection.ex:1510: DBConnection.log/5
    (mongodb_driver 0.9.2) lib/mongo.ex:1526: Mongo.exec_hello/2
    (stdlib 3.16.1) timer.erl:166: :timer.tc/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:239: Mongo.Monitor.get_server_description/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:164: Mongo.Monitor.update_server_description/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:151: Mongo.Monitor.handle_info/2
    (stdlib 3.16.1) gen_server.erl:695: :gen_server.try_dispatch/4
    (stdlib 3.16.1) gen_server.erl:771: :gen_server.handle_msg/6
    (stdlib 3.16.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

app1 throws 3 of these errors, and then they pop up intermittently, but app1 can go on to successfully issue commands, e.g.

# app1
iex> Mongo.insert_one(:my_mongo, "foo", %{what: "up"})
{:ok,
 %Mongo.InsertOneResult{
   acknowledged: true,
   inserted_id: #BSON.ObjectId<633d93aac6e47b35cb0d1fc5>
 }}

whereas app2 just throws an unending stream of these errors into the log. The same command on app2 yields a timeout:

# app2
iex> Mongo.insert_one(:my_mongo, "foo", %{what: "up"})
** (exit) exited in: GenServer.call(:my_mongo, {:checkout_session, :write, [write_counter: 1]}, 60000)
    ** (EXIT) time out
    (elixir 1.13.4) lib/gen_server.ex:1030: GenServer.call/3
    (mongodb_driver 0.9.2) lib/mongo/session.ex:142: Mongo.Session.start_session/3
    (mongodb_driver 0.9.2) lib/mongo/session.ex:694: Mongo.Session.in_session/5
    (mongodb_driver 0.9.2) lib/mongo.ex:1482: Mongo.issue_command/4
    (mongodb_driver 0.9.2) lib/mongo.ex:849: Mongo.insert_one/4

It’s like Mongo “knows” how its server is getting started or something. Very curious. Any other ideas for what could explain this behavior?

I just noticed app1 shows "mongodb_driver": {:hex, :mongodb_driver, "0.9.1" in its mix.lock whereas app2 specifies "mongodb_driver": {:hex, :mongodb_driver, "0.9.2" … so I’m gonna suss that out

1 Like

If both apps have identical configurations then they started the supervisor related stuff twice. Use only one configuration. A topology should only started as a singleton.

1 Like

To clarify: the supervisor is only started once, I’m just comparing behaviors between the 2 apps. When app1 is run as a stand-alone app, it starts up the supervisor and Mongo. When app2 starts, it doesn’t explicitly start anything, it defers to app1 to start up its stuff. (I only explored variants to see if I could figure out what was going on here).

Aha, the problem seems to be with version 0.9.2. As soon as I force app2 to use 0.9.1, it behaves the same way as app1: the log periodically shows the error (see above), but it is able to issue commands.

So really this has nothing to do with supervisors, it’s just something to do with differences between v0.9.1 and v0.9.2. The fact that errors are being logged may indicate something problematic, however.

I should add that I’m connecting to an instance of mongo v4.0.0 – this is because in production we rely on AWS DocumentDB, and v4.0.0 seems to be the most similar.

The driver 0.9.2 does not support 4.0. Sorry, you need to downgrade the driver. It does not support AWS DocumentDB.

1 Like

Ok, we’ll pin it to v0.9.1. Thanks for the responses! They are much appreciated.
This really is a great driver – I’d love to see an Ecto adapter for it. We’ve got a handful of code that could maybe be adapted for that.

Interesting re DocumentDB: we are using DocumentDB in production with mongodb_driver v0.9.1 and it has been working fine – admittedly our queries are limited to mostly CRUD without a lot of filters or sorting, but it hasn’t had any issues.

1 Like

The handshake workflow changed. Instead of using ismaster, the new handshake workflow uses hello!

The hello command was introduced with Mongo 4.2 or 4.4, so I decided to use the new hello command.

I think, you found a bug in the hello command. I will fix it and then the driver should work with AWS DocumentDB!

That’s great! Do you want me to file a bug with steps to reproduce?

No, thanks. But you can verify the solution. I give me some days, because I’m on vacation. Next week, I will commit in the master branch some updates, which should fix the issue with 4.0 and DocumentDB. If you verify, that the commits fix your issue, I will release a new version. Thanks for reporting this issue.

Sounds like a plan! Enjoy your vacation, and thanks for the consideration!

I committed a fix on the master branch. Feel free to test it. If it is working like expected, then I will draft a new release for the fix!

1 Like

I’ve tested the master branch changes locally on Mongo 4.0 and on our production stack running DocumentDB and it works! I set log: true in my config – Mongo 4.0 complains a little with errors like:

17:18:25.177 application=db_connection module=DBConnection function=log_raised/4 [error] an exception was raised logging %DBConnection.LogEntry{call: :execute, connection_time: 5655000, decode_time: 21000, idle_time: 498146000, params: [], pool_time: 179000, query: %Mongo.Query{action: {:exec_hello, []}}, result: {:ok, %Mongo.Query{action: {:exec_hello, []}}, {%{"ismaster" => true, "localTime" => ~U[2022-10-11 21:18:25.189Z], "logicalSessionTimeoutMinutes" => 30, "maxBsonObjectSize" => 16777216, "maxMessageSizeBytes" => 48000000, "maxWireVersion" => 7, "maxWriteBatchSize" => 100000, "minWireVersion" => 0, "ok" => 1.0, "readOnly" => false}, %Mongo.Events.CommandStartedEvent{command: :hello, command_name: :hello, connection_id: #PID<0.535.0>, database_name: "admin", operation_id: nil, request_id: 2}, 0, 5565}}}: ** (BadFunctionError) expected a function, got: true
    (db_connection 2.4.2) lib/db_connection.ex:1532: DBConnection.log/2
    (db_connection 2.4.2) lib/db_connection.ex:1510: DBConnection.log/5
    (mongodb_driver 0.9.2) lib/mongo.ex:1526: Mongo.exec_hello/2
    (stdlib 3.16.1) timer.erl:166: :timer.tc/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:239: Mongo.Monitor.get_server_description/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:164: Mongo.Monitor.update_server_description/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:151: Mongo.Monitor.handle_info/2
    (stdlib 3.16.1) gen_server.erl:695: :gen_server.try_dispatch/4

But it works fine! Just like it did with 0.9.1.

On the DocumentDB instance, no errors are logged at all. So this is looking pretty good to me.

Thank you for the response. I will check the error, that you mentioned!

Ok, I think, you used the wrong log configuration from the DBConnection module:

   * `:log` - A function to log information about a call, either
    a 1-arity fun, `{module, function, args}` with `t:DBConnection.LogEntry.t/0`

But you should use:

config :mongodb_driver,
  log: true

in your config file. I think, the logging issue in your example is cleared. I will provide a new release in the next few days.

Hmm… the error is there when log: true is passed as an option, e.g.

iex> Mongo.start_link([url: "mongodb://localhost:27017/my_dev", name: :my_mongo, log: true])
{:ok, #PID<0.338.0>}
iex(3)> [error] an exception was raised logging %DBConnection.LogEntry{call: :execute, connection_time: 3939000, decode_time: 1434000, idle_time: 2857000, params: [], pool_time: 19000, query: %Mongo.Query{action: {:exec_hello, []}}, result: {:ok, %Mongo.Query{action: {:exec_hello, []}}, {%{"ismaster" => true, "localTime" => ~U[2022-10-14 14:13:36.151Z], "logicalSessionTimeoutMinutes" => 30, "maxBsonObjectSize" => 16777216, "maxMessageSizeBytes" => 48000000, "maxWireVersion" => 7, "maxWriteBatchSize" => 100000, "minWireVersion" => 0, "ok" => 1.0, "readOnly" => false}, %Mongo.Events.CommandStartedEvent{command: :hello, command_name: :hello, connection_id: #PID<0.339.0>, database_name: "admin", operation_id: nil, request_id: 0}, 0, 3920}}}: ** (BadFunctionError) expected a function, got: true
    (db_connection 2.4.2) lib/db_connection.ex:1532: DBConnection.log/2
    (db_connection 2.4.2) lib/db_connection.ex:1510: DBConnection.log/5
    (mongodb_driver 0.9.2) lib/mongo.ex:1526: Mongo.exec_hello/2
    (stdlib 3.16.1) timer.erl:166: :timer.tc/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:239: Mongo.Monitor.get_server_description/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:164: Mongo.Monitor.update_server_description/1
    (mongodb_driver 0.9.2) lib/mongo/monitor.ex:151: Mongo.Monitor.handle_info/2
    (stdlib 3.16.1) gen_server.erl:695: :gen_server.try_dispatch/4
 [metadata]

It’s referencing the master branch, commit d7c8652b4e66a0031e1a7fe36bc2af5c85cef408 is in the mix.lock. However, if I set the option in the config (instead of passing it as an option), I don’t get the error.

And to be thorough, here’s how I installed Mongo:

docker pull mongo:4.0.0
docker run -p 27017:27017 -d --name mongo mongo:4.0.0

However, this error has been there for a while (including in 0.9.1) and everything is working – I just never noticed the error because I had never enabled the :log option, and the message doesn’t appear when the same code connects to AWS DocumentDB. So it may not be worth trying to squash the error for older versions of Mongo. Or if the hello stuff is specific to different versions, then maybe the code could be isolated into its own module and overridden via config or something.

Thank you for making such a great package and for maintaining it so vigorously!

All good! To make it clear: the log option is not part of the driver, it belongs to DBConnection, that is another package. You configure the logging in the config file.

1 Like

Are you ready to tag/release 0.9.3?