Techempower benchmarks

outlog · June 1, 2017, 6:09pm

I actually looked at this when we fixed up things for the other benchmarks, but gave up on it. Honestly the config for the DB is insane.
In general the phoenix code is not optimized, and in my eyes it doesn’t look very functional or elixir-ish.

Three significant things slows down phoenix:

Json encoding - obviously slower in elixir than in c.
DB test - poolsize 20 and the benchmark setup which is not normal/unrealistic (PG has max_connections 2000 and the the test is done at low (keepalive) concurrency of 256 - thus not requiring a DB pool at all - and indeed some of the frameworks are not using one )
Minor optimizations, making it faster and more elixir-ish - (pattern match params, batch update etc.)

notes on the different functions from back then:

1. json
def _json (json bench) and def db (db+json bench)
use jiffy to encode.

Benchee.run(%{time: 50, parallel: 8}, %{
  "poison"  => fn -> Poison.encode!(%{message: "Hello, world!"}) end,
  "jiffy" => fn -> :jiffy.encode(%{message: "Hello, world!"}) end,
  "jiffyerl" => fn -> :jiffy.encode({[{<<"message">>, <<"Hello, world!">>}]}) end
})

Comparison: 
jiffyerl      343.87 K
jiffy         303.72 K - 1.13x slower
poison        188.26 K - 1.83x slower

2. def queries and def updates params to int
both def queries and def updates currently does a less than optimal param parsing. by doing the classic (conn, %{"queries" => queries_param}) the matching all the way to integer is ~50% faster(but its really fast anyhow)

benchee:
    new        4.09 M
    old        2.74 M - 1.49x slower

this does require some rework to handle missing params cases. I propose this which is hopefully also much more idiomatic:

  #pattern match queries and value queries_param
  def queries(conn, %{"queries" => queries_param}) do
    q = try do
      String.to_integer(queries_param)
    rescue
      ArgumentError -> :not_integer
    end
    queries_rules(conn, q)
  end

  #queries didn't pattern match above aka are missing
  def queries(conn, _unused_params), do: queries_rules(conn, :missing)

  defp queries_rules(conn, queries_param) do
    case queries_param do
      :missing       -> queries_response(conn, 1,   :missing)       # If the parameter is missing,
      :not_integer   -> queries_response(conn, 1,   :not_integer)   # is not an integer, 
      x when x < 1   -> queries_response(conn, 1,   :less_than_one) # or is an integer less than 1, the value should be interpreted as 1; 
      x when x > 500 -> queries_response(conn, 500, :more_than_500) # if greater than 500, the value should be interpreted as 500.
      x              -> queries_response(conn, x,   :ok)            # The queries parameter must be bounded to between 1 and 500. 
    end
  end 

  defp queries_response(conn, parsed_param, _status ) do
    conn
    |> put_resp_content_type("application/json")
    |> send_resp(200, Poison.encode!(Enum.map(1..parsed_param, fn _ -> Repo.get(World, :rand.uniform(10000)) end)))
  end

I would have liked to do Integer.parse- but this is much slower than the try/rescue unfortunately - (elixir might be in need of a String.to_integer() equivalent that returns integer or :error and not ArgumentError - or a :ok/:error tuple):

#slower than try/rescue :/
case Integer.parse(queries_param) do
  # {int, remainder} int is only perfectly good if remainder is empty ""
  {queries_int, ""} -> queries_rules(conn, queries_int)
  _ ->                 queries_rules(conn, :not_integer)
end

the same pattern matching params to int refactor applies to def updates

3. def updates
this is around where I gave up, as I realized the realities (or lack thereof!) of the DB benchmarks.
use the same param matching as above.
rules does NOT allow batch querying the records(sic!). In my limited test asyncing the querying was not fruitful.(yes I did test different DB pool sizes, and asyncing levels ymmv)
rules does ALLOW batch updating.
this is what I ended up with, which I make no claims about being pretty nor fully optimized:

ids = 1..q 
|> Stream.map(fn _ -> :random.uniform(10_000) end)

ws = ids 
|> Enum.map( &Repo.one(
  from p in HelloPhoenix.Post, 
  where: p.id == ^&1,
  select: map(p, [:id, :randomnumber]) ) )
|> Enum.map( &Map.put( &1, :randomnumber, to_string(:random.uniform(10_000)) ))

Repo.insert_all(HelloPhoenix.Post, Enum.uniq_by(ws,fn x -> x.id end ), on_conflict: :replace_all, conflict_target: :id)

and then return the ws json encoded. This utilizes upsert and does the update in a batch. Enum.uniq_by(ws,fn x -> x.id end ) is there to handle edge scenario, where :random.uniform(10_000) has returned the same number and there are duplicates in the ids array, obviously postgres barks at a batch update holding opposing truths - I hope other DBs does the same. And I’m at a loss how this is the spec for the benchmark, and was the tipping point for me.

In general I would say we go for making it pretty and nice code that showcases the readability/productivity of elixir/phoenix, I’m sure phoenix will perform fine (as it already does).

Changing the DB pool size to 2000 is just too much, and I doubt it’s even faster - especially in the real world.

I too would like to see multi-hour benchmarks, and get away from keepalive, no GC, do nothing really fast for 15 secs tests.
I would also add (hot/rollover) code deploys, various peak times, slow clients, endpoint with errors etc. to the multihour tests.