We (@frahugo and I) have the following intermittent exception raised when running the test suite:
1) test handling a BuyableProductListed event given a standalone product, it lists the product (MyApp.StatusHandlerTest)
test/status_handler_test.exs:14
** (exit) exited in: GenServer.call(MyApp.Router, %MyApp.BuyableProductListed{payload: %{buyable_product_seller_sku: ":master-sku:", product_group_seller_sku: "3", seller_id: ":seller-slug:"}}, 5000)
** (EXIT) exited in: GenServer.call(#PID<0.740.0>, {:checkout, #Reference<0.0.1.19398>, true, 15000}, 5000)
** (EXIT) shutdown: "owner #PID<0.739.0> exited with: shutdown"
stacktrace:
(elixir) lib/gen_server.ex:737: GenServer.call/3
test/status_handler_test.exs:97: MyApp.StatusHandlerTest.notify/1
test/status_handler_test.exs:21: (test)
08:46:41.609 [error] GenServer MyApp.Router terminating
** (stop) exited in: GenServer.call(#PID<0.740.0>, {:checkout, #Reference<0.0.1.19398>, true, 15000}, 5000)
** (EXIT) shutdown: "owner #PID<0.739.0> exited with: shutdown"
(db_connection) lib/db_connection/ownership/proxy.ex:32: DBConnection.Ownership.Proxy.checkout/2
(db_connection) lib/db_connection.ex:920: DBConnection.checkout/2
(db_connection) lib/db_connection.ex:742: DBConnection.run/3
(db_connection) lib/db_connection.ex:790: DBConnection.transaction/3
(my_app) lib/my_app/router.ex:54: MyApp.Router.handle_call/3
(stdlib) gen_server.erl:615: :gen_server.try_handle_call/4
(stdlib) gen_server.erl:647: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message: %MyApp.BuyableProductListed{payload: %{buyable_product_seller_sku: ":master-sku:", product_group_seller_sku: "3", seller_id: ":seller-slug:"}}
About the MyApp.Router
:
- is not a HTTP Router
- it is an application router that Routes events to functions
- it is a GenServer started by a supervisor when the application starts
- all messages are handled synchronously (i.e.
GenServer.call
)
@frahugo investigate the ownership of connections. Here is what he found:
When I set the ownership_log
to :info
in the Repo configuration, I see some interesting messages that might indicate a concurrency problem in “releasing” the proxy when the test process goesDOWN
.
As you can see below, the long-running Router PID is 714. For each test, that PID is “allowed” to use the proxy checked out by the test (#PID<0.714.0> allowed on proxy ...
). When the test process ends, db_connection
receives a DOWN
message (because it is monitoring the test process), and then logs message #PID<0.714.0> lose proxy...
. Therefore, on the next test, the Router (pid 714) will have to use the new proxy that the next test will check out.
It looks like the DOWN
message is processed asynchronously, and that at some point, a new test starts while the owner_down
function in db_connection
is busy cleaning up the checkouts/owners map. That would explain why the Router is then trying to use the proxy from the previous test.
10:05:00.093 [info] Subscribe #PID<0.714.0> / Elixir.MyApp.Router to test_channel
10:05:02.719 [info] >>>>>>>>>>>>>>>>>>>>>>>
10:05:02.719 [info] TEST pid: #PID<0.756.0>
MyApp.ProductHandlerTest
* test updating a product in catalog given a non-existing standalone product, it adds the product
10:05:02.726 [info] #PID<0.756.0> owns proxy #PID<0.757.0>
10:05:02.802 [info] #PID<0.714.0> allowed on proxy #PID<0.757.0>
* test updating a product in catalog given a non-existing standalone product, it adds the product (508.9ms)
10:05:03.228 [info] >>>>>>>>>>>>>>>>>>>>>>>
* test updating a product in catalog given new image for a regular variant, it adds images to the variant
10:05:03.229 [info] TEST pid: #PID<0.791.0>
10:05:03.229 [info] #PID<0.791.0> owns proxy #PID<0.792.0>
10:05:03.244 [info] #PID<0.756.0>, #PID<0.714.0> lose proxy #PID<0.757.0>
10:05:03.428 [info] #PID<0.714.0> allowed on proxy #PID<0.792.0>
* test updating a product in catalog given new image for a regular variant, it adds images to the variant (1059.4ms)
10:05:04.288 [info] >>>>>>>>>>>>>>>>>>>>>>>
* test adding a product to catalog adding a product with just one variant
10:05:04.288 [info] TEST pid: #PID<0.920.0>
10:05:04.288 [info] #PID<0.920.0> owns proxy #PID<0.921.0>
10:05:04.301 [info] #PID<0.791.0>, #PID<0.714.0> lose proxy #PID<0.792.0>
10:05:04.303 [info] #PID<0.714.0> allowed on proxy #PID<0.921.0>
* test adding a product to catalog adding a product with just one variant (186.2ms)
* test adding a product to catalog adding a product with many variants
10:05:04.475 [info] >>>>>>>>>>>>>>>>>>>>>>>
10:05:04.475 [info] TEST pid: #PID<0.958.0>
10:05:04.475 [info] #PID<0.958.0> owns proxy #PID<0.959.0>
10:05:04.492 [info] #PID<0.920.0>, #PID<0.714.0> lose proxy #PID<0.921.0>
10:05:04.511 [info] Router terminated: self: #PID<0.714.0> : {{:shutdown, "owner #PID<0.920.0> exited with: shutdown"}, {GenServer, :call, [#PID<0.921.0>, {:checkout, #Reference<0.0.2.22672>, true, 125000}, 125000]}} state: nil
10:05:04.511 [info] Subscribe #PID<0.960.0> / Elixir.MyApp.Router to test_channel
* test adding a product to catalog adding a product with many variants (36.7ms)
10:05:04.511 [info] >>>>>>>>>>>>>>>>>>>>>>>
10:05:04.511 [info] TEST pid: #PID<0.961.0>
10:05:04.512 [info] #PID<0.961.0> owns proxy #PID<0.962.0>
10:05:04.512 [info] #PID<0.958.0> lose proxy #PID<0.959.0>
1) test adding a product to catalog adding a product with many variants (MyApp.ProductHandlerTest)
test/product_handler_test.exs:17
** (exit) exited in: GenServer.call(MyApp.Router, %MyApp.ProductGroupAdded{payload: %MyApp.ProductGroup{__meta__: #Ecto.Schema.Metadata<:built, "product_groups">, buyable_products: [%MyApp.BuyableProduct{__meta__: #Ecto.Schema.Metadata<:built, "buyable_products">, created_at: nil, fields: [%{"name" => "price", "value" => "10.00"}, %{"name" => "msrp", "value" => "12.00"}, %{"name" => "item_length", "value" => "5.0"}, %{"name" => "item_width", "value" => "6.0"}, %{"name" => "item_height", "value" => "4.0"}, %{"name" => "item_weight", "value" => "2.0"}, %{"name" => "shipping_cost_standard", "value" => "5.95"}, %{"name" => "shipping_cost_expedited", "value" => "9.95"}, %{"name" => "size", "value" => "Large"}, %{"name" => "style", "value" => "Chevron"}], listing_status: nil, product_group_seller_sku: nil, quantity: nil, seller_id: ":slug:", seller_sku: ":buyable-product-sku-1:", updated_at: nil}, %MyApp.BuyableProduct{__meta__: #Ecto.Schema.Metadata<:built, "buyable_products">, created_at: nil, fields: [%{"name" => "price", "value" => "12.00"}, %{"name" => "msrp", "value" => "14.00"}, %{"name" => "item_length", "value" => "7.0"}, %{"name" => "item_width", "value" => "5.0"}, %{"name" => "item_height", "value" => "6.0"}, %{"name" => "item_weight", "value" => "4.0"}, %{"name" => "shipping_cost_standard", "value" => "5.95"}, %{"name" => "shipping_cost_expedited", "value" => "9.95"}, %{"name" => "size", "value" => "Small"}, %{"name" => "style", "value" => "Chevron"}, %{"name" => "image2", "value" => "https://dummyimage.com/10/09f/fff-2.png"}, %{"name" => "image3", "value" => "https://dummyimage.com/10/09f/fff-3.png"}, %{"name" => "image4", "value" => "https://dummyimage.com/10/09f/fff-4.png"}, %{"name" => "image5", "value" => "https://dummyimage.com/10/09f/fff-5.png"}], listing_status: nil, product_group_seller_sku: nil, quantity: nil, seller_id: 1, seller_sku: ":buyable-product-sku-2:", updated_at: nil}], created_at: nil, fields: [%{"name" => "item_name", "value" => ":new-name:"}, %{"name" => "item_description", "value" => ":description:"}, %{"name" => "brand", "value" => ":brand:"}, %{"name" => "gender", "value" => "Male"}, %{"name" => "category", "value" => ":category-name:"}, %{"name" => "prop_65", "value" => "true"}, %{"name" => "image1", "value" => "https://dummyimage.com/10/09f/fff.png"}], seller_id: ":slug:", seller_sku: ":master-sku:", updated_at: nil}}, 5000)
** (EXIT) exited in: GenServer.call(#PID<0.921.0>, {:checkout, #Reference<0.0.2.22672>, true, 125000}, 125000)
** (EXIT) shutdown: "owner #PID<0.920.0> exited with: shutdown"
stacktrace:
Other possibly helpful information:
- this occurs in running a small suite of 5 tests
- we’re running MySQL
- our tests are configured with
Ecto.Adapters.SQL.Sandbox.mode(MyApp.Repo, {:shared, self()})
- tests are run synchronously
- no issues when calling functions directly, i.e. not going via the
Router
- it appears setting the repo config to
pool_size: 1
resolves the issue - interesting this setting was removed from the generators: https://github.com/elixir-ecto/ecto/issues/1279
- the repo being used in another OTP application
Any help on how to debug or resolve this would be much appreciated. Thank you in advance.