Phoenix slow performance and high error rate with API using MongoDB

din_S · April 5, 2020, 7:01pm

I’m using Phoenix to build an API that takes an HTTP request (JSON) and modify some fields in it (after connecting to MongoDB), send it to another server, receive the response, modify some fields in the response and send it back to the client. I implemented the same application with NodeJS as well

I run both on Ubuntu server (and both in production mode, using PORT=4000 MIX_ENV=prod mix phx.server for phoenix).

The internet speed is low: 200 Kbps (but for both implementations)

I run tests using Apache Jmeter for both of them
Threads: 500
Ramp-up: 1
loops: 10
and get the following results

NodeJS:
4 - Node 500_10
Phoenix:
4 - Phoenix 500_10

what really surprised me is that phoenix is as fast as NodeJS and the error rate of phoenix is higher

I tried many solutions online like:

Putting pool size of Mongo to 100.
config :logger, level: :warn:
config :bench_phoenix, BenchPhoenix.Endpoint, http: [port: 4000, protocol_options: [max_keepalive: 5_000_000] ],

using this Guide

what should I do?

outlog · April 6, 2020, 8:32am

Hi, first of all 39,4% vs 44,6 error rate isn’t much of a difference…

figure out what you are testing - obviously you are saturating the servers - what is the bottleneck - try lowering the testing load and figure out what is going on - you mention limited bandwidth… is that the bottleneck, what is causing the errors - external api timeouts, db timeouts - request timeouts?
timeouts - this is obviously about timeouts - what does the logs say - figure out increasing the various timeouts (both node/elixir) - so you can run the load test with no errors.
code? what does the code do - how have you programmed it? is a simple controller? do you use Genservers etc - what http client is used for calling external api - is that connection kept alive etc etc

Phoenix ships with a 60 second idle_timeout - start by increasing it

config :hello, HelloWeb.Endpoint,
  http: [ port: 4000, 
             protocol_options: [idle_timeout: 5_000_000] 
        ],...

din_S · April 6, 2020, 8:33am

When debugging I found that HTTPoison return the following errors

** (CaseClauseError) no case clause matching: {:error, %HTTPoison.Error{id: nil, reason: :timeout}}

** (CaseClauseError) no case clause matching: {:error, %HTTPoison.Error{id: nil, reason: :checktimeout}}

does that means the cause is the other server?

outlog · April 6, 2020, 8:37am

most likely - could also be why you see similar/same amount of error rates on node…

so what you are testing isn’t really the node/phoenix servers - but the external api servers;-) ?

but - there are various http clients for elixir - some better than others for various scenarios - how does the external api act? is it rate limiting you? maybe you need to rate limit requests sent to external api etc. - can you keep alive the connection and reuse it? etc.

maybe try something like GitHub - appcues/mojito: An easy-to-use Elixir HTTP client, built on the low-level Mint library. for making the http request…

outlog · April 6, 2020, 8:52am

but start by increasing HTTPoison timeout (and above phoenix idle_timeout) and the http client timeout on node (and any idle timeout) - try to get no errors… then optimize…

din_S · April 6, 2020, 12:21pm

I did the following:

change the timeout of HTTPoison to “50_000_000”
change time out of phoenix

config :hello_phoenix, HelloPhoenixWeb.Endpoint,
  http: [ port: 4000, 
            protocol_options: [idle_timeout: 5_000_000,max_keepalive: 5_000_000] 
        ],
  url: [host: "example.com", port: 80]

and the error rate got worse 50.69%

I tried to test a simple function like this:

def index(conn, _params) do
    json(conn, %{result: "hello"})
  end

and get the following result:
Capture

outlog · April 6, 2020, 5:23pm

assuming you got idle_timeout configured - and all you are doing is json encoding the error is in the testing/OS level…

I know mac os out of the box would most likely NOT be happy opening 5000 connections in 53 seconds - what OS, what ulimits etc etc…

also I have no experience with jmeter - does it have timeouts configured? - I would probably use https://gatling.io to do any serious load testing…