After updating to 1.6, I'm seeing a perplexing set of failures with hound/hackney/cowboy

I’ve been trying to debug this issue all day and I’ve made no progress. I’ve asked some friends to help out and they, too, are perplexed. The only thing that has changed that might have anything to do with the cause of the problem is that I updated Elixir to 1.6.0 and suddenly everything broke. But afterwards I downgraded to 1.5.3 and then 1.5.3-otp-19 (I had 20.2.2 installed) but it’s no longer back in a working condition… well it’s never been 100% broken.

Okay, so here’s the error set I’m seeing:

[error] GenServer #PID<0.584.0> terminating
** (ArgumentError) argument error
    :erlang.send_after(:infinity, #PID<0.584.0>, {:timeout, #Port<0.21733>})
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:433: :hackney_pool.store_socket/3
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:286: :hackney_pool.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from Hound.SessionServer): {:checkin, #Reference<0.1051238069.2352742407.15030>, {'localhost', 4444, :hackney_tcp}, #Port<0.21733>, :hackney_tcp}
State: {:state, :default, {:metrics_ng, :metrics_dummy}, 50, :infinity, {:dict, 1, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [[#Reference<0.1051238069.2352742407.15030> | {'localhost', 4444, :hackney_tcp}]], [], [], [], [], [], [], [], [], [], [], []}}}, {:dict, 0, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}}}, {:dict, 0, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}}}, {:dict, 0, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}}}, 0}
Client Hound.SessionServer is alive
    (stdlib) gen.erl:169: :gen.do_call/4
    (stdlib) gen_server.erl:210: :gen_server.call/3
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_response.erl:292: :hackney_response.end_stream_body/2
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_response.erl:147: :hackney_response.stream_body1/2
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_response.erl:303: :hackney_response.read_body/3
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney.erl:952: :hackney.reply_with_body/3
    (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
    (hound) lib/hound/session_server.ex:99: Hound.SessionServer.create_session/2
    (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

[error] GenServer Hound.SessionServer terminating
** (CaseClauseError) no case clause matching: {:EXIT, {:noproc, {:gen_server, :call, [#PID<0.584.0>, {:checkout, {'localhost', 4444, :hackney_tcp}, #PID<0.378.0>, #Reference<0.1051238069.2352742407.15044>}, 8000]}}}
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:67: :hackney_pool.checkout/4
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_connect.erl:187: :hackney_connect.socket_from_pool/4
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_connect.erl:37: :hackney_connect.connect/5
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney.erl:315: :hackney.request/5
    (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
    (hound) lib/hound/session_server.ex:99: Hound.SessionServer.create_session/2
    (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.588.0>): {:change_session, #PID<0.588.0>, "session_key", []}
State: %{#Reference<0.1051238069.2352742401.14418> => #PID<0.577.0>, #Reference<0.1051238069.2352742407.15021> => #PID<0.588.0>}
Client #PID<0.588.0> is alive
    (stdlib) gen.erl:169: :gen.do_call/4
    (elixir) lib/gen_server.ex:828: GenServer.call/3
    (hound) lib/hound/helpers/session.ex:44: Hound.Helpers.Session.in_browser_session/2
    (app_sync) lib/app_sync/webdriver.ex:29: AppSync.Webdriver.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

[error] GenServer #PID<0.588.0> terminating
** (stop) exited in: GenServer.call(Hound.SessionServer, {:change_session, #PID<0.588.0>, "session_key", []}, 60000)
    ** (EXIT) an exception was raised:
        ** (CaseClauseError) no case clause matching: {:EXIT, {:noproc, {:gen_server, :call, [#PID<0.584.0>, {:checkout, {'localhost', 4444, :hackney_tcp}, #PID<0.378.0>, #Reference<0.1051238069.2352742407.15044>}, 8000]}}}
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:67: :hackney_pool.checkout/4
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_connect.erl:187: :hackney_connect.socket_from_pool/4
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_connect.erl:37: :hackney_connect.connect/5
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney.erl:315: :hackney.request/5
            (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
            (hound) lib/hound/session_server.ex:99: Hound.SessionServer.create_session/2
            (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
            (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
            (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
    (elixir) lib/gen_server.ex:831: GenServer.call/3
    (hound) lib/hound/helpers/session.ex:44: Hound.Helpers.Session.in_browser_session/2
    (app_sync) lib/app_sync/webdriver.ex:29: AppSync.Webdriver.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.586.0>): {:visit, "http://sass-lang.com/documentation/Sass/Script/Functions.html#nth-instance_method"}
State: "session_key"
Client #PID<0.586.0> is alive
    (stdlib) gen.erl:169: :gen.do_call/4
    (elixir) lib/gen_server.ex:828: GenServer.call/3
    (app_sync) lib/app_sync_web/controllers/page_controller.ex:14: AppSyncWeb.PageController.visit_page/2
    (app_sync) lib/app_sync_web/controllers/page_controller.ex:1: AppSyncWeb.PageController.action/2
    (app_sync) lib/app_sync_web/controllers/page_controller.ex:1: AppSyncWeb.PageController.phoenix_controller_pipeline/2
    (app_sync) lib/app_sync_web/endpoint.ex:1: AppSyncWeb.Endpoint.instrument/4
    (phoenix) lib/phoenix/router.ex:278: Phoenix.Router.__call__/1
    (app_sync) lib/app_sync_web/endpoint.ex:1: AppSyncWeb.Endpoint.plug_builder_call/2
    (app_sync) lib/plug/debugger.ex:99: appSyncWeb.Endpoint."call (overridable 3)"/2
    (app_sync) lib/app_sync_web/endpoint.ex:1: AppSyncWeb.Endpoint.call/2
    (plug) lib/plug/adapters/cowboy/handler.ex:15: Plug.Adapters.Cowboy.Handler.upgrade/4
    (cowboy) /Users/brandonbuck/Dev/work/app_sync/deps/cowboy/src/cowboy_protocol.erl:442: :cowboy_protocol.execute/4

[error] Ranch protocol #PID<0.586.0> (:cowboy_protocol) of listener appSyncWeb.Endpoint.HTTP terminated
** (exit) exited in: GenServer.call(Hound.SessionServer, {:change_session, #PID<0.588.0>, "session_key", []}, 60000)
    ** (EXIT) an exception was raised:
        ** (CaseClauseError) no case clause matching: {:EXIT, {:noproc, {:gen_server, :call, [#PID<0.584.0>, {:checkout, {'localhost', 4444, :hackney_tcp}, #PID<0.378.0>, #Reference<0.1051238069.2352742407.15044>}, 8000]}}}
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:67: :hackney_pool.checkout/4
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_connect.erl:187: :hackney_connect.socket_from_pool/4
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_connect.erl:37: :hackney_connect.connect/5
            (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney.erl:315: :hackney.request/5
            (hound) lib/hound/request_utils.ex:43: Hound.RequestUtils.send_req/4
            (hound) lib/hound/session_server.ex:99: Hound.SessionServer.create_session/2
            (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
            (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
            (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

What is happening is a request is being to a Phoenix server, the controller is spinning up a wrapper I made around a Hound session, navigating to a page and then dumping the source back to the caller. The relevant part of the wrapper looks like this:

defmodule Webdriver do
  use GenServer
  use Hound.Helpers

  def start_link(session_key) do
    GenServer.start_link(__MODULE__, session_key)
  end

  def init(session_key) do
    Hound.start_session()
    {:ok, session_key}
  end

  def visit(pid, path), do: GenServer.call(pid, {:visit, path})

  def handle_call({:visit, path}, _from, key) do
    in_browser_session key, fn ->
      navigate_to(path)
    end
    {:reply, :ok, key}
  end
end

When I make the request, I get the huge error dump above, if I run IEx and start up the Webdriver by itself I see this:

[error] GenServer #PID<0.435.0> terminating
** (ArgumentError) argument error
    :erlang.send_after(:infinity, #PID<0.435.0>, {:timeout, #Port<0.10546>})
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:433: :hackney_pool.store_socket/3
    (hackney) /Users/brandonbuck/Dev/work/app_sync/deps/hackney/src/hackney_pool.erl:286: :hackney_pool.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Last message (from Hound.SessionServer): {:checkin, #Reference<0.3197467851.1281884165.56861>, {'localhost', 4444, :hackney_tcp}, #Port<0.10546>, :hackney_tcp}
State: {:state, :default, {:metrics_ng, :metrics_dummy}, 50, :infinity, {:dict, 1, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [[#Reference<0.3197467851.1281884165.56861> | {'localhost', 4444, :hackney_tcp}]], [], [], [], []}}}, {:dict, 0, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}}}, {:dict, 0, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}}}, {:dict, 0, 16, 16, 8, 80, 48, {[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}, {{[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []}}}, 0}
Client Hound.SessionServer is alive
    (kernel) code_server.erl:140: :code_server.call/1
    (kernel) error_handler.erl:41: :error_handler.undefined_function/3
    (hound) lib/hound/request_utils.ex:48: Hound.RequestUtils.handle_response/3
    (hound) lib/hound/session_server.ex:99: Hound.SessionServer.create_session/2
    (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
    (stdlib) gen_server.erl:636: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:665: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3

Odd thing to me is that the Webdriver process never dies (verified with :observer), and functions normally. But whatever is causing these failures seems to crash Cowboy and so 99% of the requests return to the client with ERR_EMPTY_RESPONSE.

I’m fairly new to Elixir myself, this being my first “project,” so I’ve pretty much exhausted everything I can think to do and my friends are equally perplexed (some who use Elixir professionally) so forgive me if we’ve missed some trivial issue but any insight is greatly appreciated.

I’ve also cleaned and recompiled all dependencies every time I swapped versions of Elixir to test, unless I missed something… mix do clean --all, deps.clean --all, deps.get, deps.compile

Finally, here is my output from mix deps:

* mimerl 1.0.2 (Hex package) (rebar3)
  locked at 1.0.2 (mimerl) 993f9b0e
  ok
* file_system 0.2.4 (Hex package) (mix)
  locked at 0.2.4 (file_system) f0bdda19
  ok
* connection 1.0.4 (Hex package) (mix)
  locked at 1.0.4 (connection) a1cae722
  ok
* metrics 1.0.1 (Hex package) (rebar3)
  locked at 1.0.1 (metrics) 25f094de
  ok
* unicode_util_compat 0.3.1 (Hex package) (rebar3)
  locked at 0.3.1 (unicode_util_compat) a1f612a7
  ok
* idna 5.1.0 (Hex package) (rebar3)
  locked at 5.1.0 (idna) d72b4eff
  ok
* gettext 0.14.0 (Hex package) (mix)
  locked at 0.14.0 (gettext) 1a019a2e
  ok
* ranch 1.3.2 (Hex package) (rebar3)
  locked at 1.3.2 (ranch) e4965a14
  ok
* poolboy 1.5.1 (Hex package) (rebar3)
  locked at 1.5.1 (poolboy) 6b461639
  ok
* decimal 1.4.1 (Hex package) (mix)
  locked at 1.4.1 (decimal) ad9e501e
  ok
* poison 3.1.0 (Hex package) (mix)
  locked at 3.1.0 (poison) d9eb6366
  ok
* ssl_verify_fun 1.1.1 (Hex package) (rebar3)
  locked at 1.1.1 (ssl_verify_fun) 28a4d65b
  ok
* html_entities 0.4.0 (Hex package) (mix)
  locked at 0.4.0 (html_entities) f2fee876
  ok
* certifi 2.0.0 (Hex package) (rebar3)
  locked at 2.0.0 (certifi) a0c0e475
  ok
* hackney 1.10.1 (Hex package) (rebar3)
  locked at 1.10.1 (hackney) c38d0ca5
  ok
* hound 1.0.4 (Hex package) (mix)
  locked at 1.0.4 (hound) 31db3c01
  ok
* db_connection 1.1.2 (Hex package) (mix)
  locked at 1.1.2 (db_connection) 2865c2a4
  ok
* httpoison 1.0.0 (Hex package) (mix)
  locked at 1.0.0 (httpoison) 1f02f827
  ok
* phoenix_pubsub 1.0.2 (Hex package) (mix)
  locked at 1.0.2 (phoenix_pubsub) bfa7fd52
  ok
* mochiweb 2.15.0 (Hex package) (rebar3)
  locked at 2.15.0 (mochiweb) e1daac47
  ok
* floki 0.19.2 (Hex package) (mix)
  locked at 0.19.2 (floki) 907ffa47
  ok
* cowlib 1.0.2 (Hex package) (rebar3)
  locked at 1.0.2 (cowlib) 9d769a1d
  ok
* cowboy 1.1.2 (Hex package) (rebar3)
  locked at 1.1.2 (cowboy) 61ac29ea
  ok
* phoenix_cowboy_logging 1.0.0 (Hex package) (mix)
  locked at 1.0.0 (phoenix_cowboy_logging) c23182e5
  ok
* clipboard 0.2.1 (Hex package) (mix)
  locked at 0.2.1 (clipboard) d1409d6c
  ok
* mime 1.2.0 (Hex package) (mix)
  locked at 1.2.0 (mime) 78adaa84
  ok
* plug 1.4.3 (Hex package) (mix)
  locked at 1.4.3 (plug) 236d77ce
  ok
* phoenix_html 2.10.5 (Hex package) (mix)
  locked at 2.10.5 (phoenix_html) 4f9df6b0
  ok
* phoenix 1.3.0 (Hex package) (mix)
  locked at 1.3.0 (phoenix) 1c01124c
  ok
* phoenix_live_reload 1.1.3 (Hex package) (mix)
  locked at 1.1.3 (phoenix_live_reload) 1d178429
  ok
* postgrex 0.13.3 (Hex package) (mix)
  locked at 0.13.3 (postgrex) c277cfb2
  ok
* ecto 2.2.7 (Hex package) (mix)
  locked at 2.2.7 (ecto) 2074106f
  ok
* phoenix_ecto 3.3.0 (Hex package) (mix)
  locked at 3.3.0 (phoenix_ecto) 702f6e16
  ok

If I’ve missed anything, sorry just let me know and I can get it for you. If you want me to run something let me know and I can do that as well. I would offer to try and make an SSCCE for you guys but when I tested the Webdriver in isolation (same code as above) there are no errors. Same code for the webdriver minus Phoenix and everything it brings in.

Okay, so I have some thoughts after more debugging.

The issue seemed to be focused on erlang:send_after from hackney_pool.erl (line 433 in the 1.10.1 version. Now, I have no idea why this was problematic… but I made some edits directly to the Erlang file and everything is working as is now.

The edits I made are replacing line 433 with:

Timer = timer:send_after(Timeout, self(), {timeout, Socket}),

And then replaced 451-452 with:

  case timer:cancel(Timer) of
    {error, _Reason} -> 

My knowledge of Erlang is rudimentary, so I can’t really speak to whether or not this works the same way, I’m going to leave the server running to see if it errors. If all goes well I’ll open an issue on hackney (if one doesn’t already exist) and try and get some feedback from the maintainers.

If anyone has any insight, commentary or words of warning for my random Erlang tinkering please let me know! I want to learn and more importantly I want to know why this broke and how this fixed it. (I did look up the different between erlang:send_after and timer:send_after so I know the former was probably chosen for performance reasons… but still – if it’s not going to work then it’s not going to work.

I don’t know what is the exact cause of your problems, but current Hackney’s pool implementation is a bit buggy and stopped our production a couple of times.

I’ve had this error with 1.5.2. In my case I was using a custom pool created as:

:hackney_pool.child_spec(MyPool, [timeout: :infinity, max_connections: System.schedulers_online])

This caused the error:

** (ArgumentError) argument error
  :erlang.send_after(:infinity, #PID<0.3051.0>, {:timeout, #Port<0.7455>})

Which according to the docs (http://erlang.org/doc/man/erlang.html#send_after-3) is expected, since the first argument must be an integer.

Changing the custom pool definition to use :timer.seconds(60) instead of :infinity fixed this. Not sure if you are also using a custom pool, but maybe the default pool also tries to use :infinity. I haven’t explored this further

1 Like

I would never, ever use a timeout of infinity, particularly with hackney pool. It’s a great way to end up in completely deadlock.

1 Like

Thanks everyone, that gives me some really good starting points to dive in and work on getting to a real solution. Funnily enough I never thought to example the types of that function as I set the timeout to :infinity in config values (assuming the config values are the true culprit) about a day before any of this started to fail, so it was never something I thought could potentially be the issue. Thanks for pointing me there!

I agree, the only reason I was using such absurd timeouts is for iteration speed (and only in development) – I never had any intention to deploy the application with this level of timeout. I had encountered some timeouts and I wanted to remove that as a blocker for the short term, looks like that was my biggest mistake.

3 Likes

And just to note, updating those timeout configs from :infinity to :timer.minutes(10) problem went away immediately. Next time I see argument error, no matter what, my first stop will be the types in the documentation.

1 Like