Struggling to get this implementation faster

Edy · February 17, 2024, 5:41pm

I am writing a simple elixir application for a hackathon. The idea of the application is to have two instances of an API behind a load balancer.

This API has two endpoints one for adding transactions, credit and debit. The other for getting summary of transactions.

In the transactions, debit transactions cannot be accepted it they are out of the limit - each customer has a max limit how much they can owe.

I made a simple implementation, even without database. But I can’t beat some other implementations I’ve seen. I didn’t expect to beat the performance of Java or Rust but I’d expect to beat others like PHP and NodeJS.

My p99 is around 90ms but I have seen some implementations in NodeJS handling 40ms - even less in some cases. I found even more faster Elixir implementations using way more libraries, which tells me I am the problem.

So far I have an Erlang cluster of two nodes. Once they get up, five global GenServers are started. The stress test simulates five customers. In my application, each of them is a gen server which contains the state of limit, balance and the latest events

  def start_link({client_id, limit}) do
    case GenServer.start_link(__MODULE__, {client_id, limit}, name: {:global, process_name(client_id)}) do
      {:ok, pid} -> {:ok, pid}
      {:error, {:already_started, pid}} -> {:ok, pid}
      error -> error
    end
  end

  @spec init({client_id :: integer(), limit :: integer()}) :: {:ok, {balance :: integer(), limit :: integer(), latest_txns :: list()}}
  def init({client_id, limit}) do
    Logger.info("start client #{inspect(process_identifier(client_id))} | #{inspect(node())} - #{inspect(Node.list())}")
    {:ok, {0, limit, []}}
  end

Besides some validations, whenever a request comes in I just perform the GenServer.call for that specific customer - hence specific process.

  def handle_transaction("c", payload, req) do
    {:ok, balance, limit} = Rinha2.Client.credit(payload["client_id"], payload)

    :cowboy_req.reply(200, %{
      <<"content-type">> => <<"application/json">>
        }, <<"{\"limite\":#{-1*limit},\"saldo\":#{balance}}">>, req)
  end

I am pretty sure it can perform better. However, I can’t see where is the bottleneck. Would somebody have any clue? Having better ideas how I can profile that would be useful as well.

Here is the source code: GitHub - geeksilva97/rinha-de-novo: tentando mais um pouco... com cowboy puro com limao
Stress test using Gatling: rinha-de-novo/load-test/user-files/simulations/rinhabackend/RinhaBackendCrebitosSimulation.scala at master · geeksilva97/rinha-de-novo · GitHub / rinha-de-novo/executar-teste-local.sh at master · geeksilva97/rinha-de-novo · GitHub

Edy · February 17, 2024, 6:14pm

Quick update. And more information.

I was executing, in Docker, but in Apple M3. 12 cores, 18 GB RAM. I was getting p99 around 90ms

I executed in Manjaro, also in Docker. Intel i5, 8 cores, 16GB RAM. I got a p99 of 4ms.

Is it all Docker overhead?

tj0 · February 17, 2024, 6:54pm

Someone else also ran into this. Likely a mac issue.

dimitarvp · February 17, 2024, 8:28pm

TBF everything runs faster in Manjaro.

But I’ve heard about spiking latencies in Docker on ARM Macs as well.

Edy · February 18, 2024, 1:28am

Got it, guys. Thanks a lot for your help!!

lubien · February 18, 2024, 6:35pm

Yeah, my Mac was overall slower too (I’m the guy from the thread on the other forum). I’ve been running on my Ubuntu since