Heavy async HTTP outbound and %Mint.TransportError{}

mhs · November 15, 2023, 4:06am

Howdy! I’m trying to do a ton of concurrent outbound HTTP efficiently. I’m doing this:

    task = fn url ->
      Req.new(
        url: url,
        finch: MyFinch,
        max_redirects: 2,
        retry: false
      )
      |> Req.Request.put_header("connection", "close")
      |> Req.head()
      |> case do
        {:ok, response} ->
          response.status

        {:error, response} ->
          :error
      end
    end

    Task.Supervisor.async_stream_nolink(
      {:via, PartitionSupervisor, {MyApp.TaskSupervisors, self()}},
      urls,
      task,
      max_concurrency: 1000,
      on_timeout: :kill_task,
      ordered: false,
      timeout: 60000
    )
    |> Enum.to_list()

At first, processing output looks good, but after ~5 seconds, I start ending up with a ton of:

%Mint.TransportError{reason: :nxdomain}
%Mint.TransportError{reason: :nxdomain}
%Mint.TransportError{reason: :timeout}
%Mint.TransportError{reason: :nxdomain}
%Mint.TransportError{reason: :nxdomain}
%Mint.TransportError{reason: :nxdomain}
%Mint.TransportError{reason: :timeout}
%Mint.TransportError{reason: :timeout}
%Mint.TransportError{reason: :timeout}

Anything bad leap out? The error seems to indicate Mint failing in connect…but…why. At lower volume or the start of a big batch, everything works fine. It’s just as it cranks up heavy it goes sideways.

mhs · November 15, 2023, 4:10am

My Finch instance looks like this:

{Finch, name: MyFinch, pools: %{:default => [size: 400, count: 4, protocol: :http1]}}

dimitarvp · November 15, 2023, 1:24pm

Is the app containerized? The nxdomain error often manifests due to problems with the container’s network configuration.

mhs · November 15, 2023, 2:25pm

Not containerized.

It’s a 1.14-rc0/OTP 25.0-rc3 mix release built on hexpm/elixir:1.14.0-rc.0-erlang-25.0-rc3-debian-stretch-20210902-slim locally and deployed on a modest Linux VM (4gb mem, 2.40GHz cpu, Debian Stretch).

I have to cross-compile as I’m on a M2 MBP. That was the last hexpm/elixir Docker image for Stretch and, since this is proof of concept type stuff at the moment, I reached for off the shelf rather than fiddling in making my own new Stretch image.

I’ve run the code natively too on my MBP via iex and observed the same behavior.

dimitarvp · November 15, 2023, 3:52pm

Not sure what to recommend, maybe your firewall / router / PiHole? I assume you have tried curl successfully?

mhs · November 15, 2023, 5:36pm

I mean, the work VM def isn’t piholed. And they both start fine but buckle as rate goes sustained high.

I’m starting to mull DNS lookup throttling.

Even Cloudflare might consider thousands of lookups bunched up from a single IP as malicious.

mhs · November 16, 2023, 4:59am

Yeah, this seems to be outside the BEAM.

Thread starvation + DNS maybe, like this thread talked about:

I’m seeing improvement with an inet configuration file tweaked to {lookup, [dns, native]}.

mhs · November 18, 2023, 8:54pm

Hmm, maybe it’s more on the BEAM internal timeslicing side rather than outside in OS or DNS processing. Like, the VM doesn’t process the burst quickly enough so lots of dns resolution mangle.

I can run the same chunk of data in some comperable Rust code on the same host and it resolves everything fine.