Elixir-dbvisor/sql Needs a SOTA pool and I want to hear your ideas

So the day has come, and due to unforeseen technical issues, then SQL - Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries. will require it’s own pool, we want to aim high, so we’re talking SOTA. I want to hear your ideas, but lets talk requirements so we know what we’re working with.

  • Minimal
  • Ideally no message passing, one or two is not a deal breaker
  • Affinity for prepared queries and scheduler id, all queries have a global unique integer set at compile time
  • Linear
  • Dynamic sizing
2 Likes

When you say SOTA, do you mean state of the art? And when you say pool, do you mean a db connection pool?

(genuine questions, not trying to be a dick)

Yes, I’ve seen “SOTA” a lot in the last few years along with more acronyms like “AI” (Artificial Intelligence) and “LLMs” (Large Language Models), and it means state-of-the-art, which in turn is defined as:

adjective
belonging or relating to the most recent stage of technological development; having or using the latest techniques or equipment.

I don’t know, and would guess it is for something else. From the little I understand, the idea of the SQL library is to convert/compile Elixir code into SQL:

SQL
Brings an extensible SQL parser and sigil to Elixir, confidently write SQL with automatic parameterized queries.

I would think it doesn’t deal with connecting with any particular RDBMS.

All DB-related dependencies are dev/test-only.

I’m also curious to learn more about the unforeseen technical issues and how they lead to pooling :slight_smile:

Yes, we’re building a state of the art database connection pool, an initial implementation in under 100 lines has been done and is already out performing db_connection in every metric. Ecto uses db_connection in a way that made it impossible to introduce features like: 100% cache hit on prepared queries.

It would actually be underselling the library to only call the pool SOTA, as we’re closing in to be SOTA SQL library across all languages as I’ve never seen anything like this before.

1 Like

So we now have a SOTA pool not only on the BEAM but across all languages (AFAIK), some of the things I listed didn’t make it, mostly becuase they ended up being completely unnecessary when you lean into the BEAM.

import Ecto.Query
defmodule SQL.Repo do
  use Ecto.Repo, otp_app: :sql, adapter: Ecto.Adapters.Postgres
  use SQL, adapter: SQL.Adapters.Postgres
  import Ecto.Query
  def sql(type \\ :transaction)
  def sql(:statement) do
    Enum.to_list(~SQL"SELECT 1")
  end
  def sql(:empty) do
    SQL.transaction do
      :ok
    end
  end
  def sql(:transaction) do
    SQL.transaction do
      Enum.to_list(~SQL"SELECT 1")
    end
  end
  def sql(:savepoint) do
    SQL.transaction do
      SQL.transaction do
        Enum.to_list(~SQL"SELECT 1")
      end
    end
  end
  def sql(:cursor) do
    SQL.transaction do
      Stream.run(~SQL"SELECT g, repeat(md5(g::text), 4) FROM generate_series(1, 5000000) AS g")
    end
  end

  def ecto(type \\ :transaction)
  def ecto(:statement) do
    SQL.Repo.all(select(from("users"), [1]))
  end
  def ecto(:empty) do
    SQL.Repo.transaction(fn ->
      :ok
    end)
  end
  def ecto(:transaction) do
    SQL.Repo.transaction(fn ->
      SQL.Repo.all(select(from("users"), [1]))
    end)
  end
  def ecto(:savepoint) do
    SQL.Repo.transaction(fn ->
      SQL.Repo.transaction(fn ->
        SQL.Repo.all(select(from("users"), [1]))
      end)
    end)
  end
  def ecto(:cursor) do
    SQL.Repo.transaction(fn ->
      from(row in fragment("SELECT g, repeat(md5(g::text), 4) FROM generate_series(1, ?) AS g", 5000000), select: {fragment("?::int", row.g), fragment("?::text", row.repeat)})
      |> SQL.Repo.stream()
      |> Stream.run()
    end)
  end
end
Application.put_env(:sql, :ecto_repos, [SQL.Repo])
Application.put_env(:sql, SQL.Repo, log: false, username: "postgres", password: "postgres", hostname: "localhost", database: "sql_test#{System.get_env("MIX_TEST_PARTITION")}", pool_size: :erlang.system_info(:schedulers_online), ssl: false)
SQL.Repo.__adapter__().storage_up(SQL.Repo.config())
SQL.Repo.start_link()

Benchee.run(
  %{
  "sql" => fn -> SQL.Repo.sql(:transaction) end,
  "ecto" => fn -> SQL.Repo.ecto(:transaction) end,
  },
  parallel: 500,
  warmup: 10,
  memory_time: 2,
  reduction_time: 2,
  unit_scaling: :smallest,
  measure_function_call_overhead: true
)

➜ sql git:(main) ✗ mix sql.bench
Operating System: macOS
CPU Information: Apple M1 Max
Number of Available Cores: 10
Available memory: 64 GB
Elixir 1.20.0-dev
Erlang 28.1
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 10 s
time: 5 s
memory time: 2 s
reduction time: 2 s
parallel: 500
inputs: none specified
Estimated total run time: 38 s

Measured function call overhead as: 0 ns
Benchmarking ecto …
Benchmarking sql …
Calculating statistics…
Formatting results…

Name ips average deviation median 99th %
sql 5507.15 181.58 μs ±1576.97% 177.13 μs 324.92 μs
ecto 56.22 17788.30 μs ±8.30% 17331.33 μs 22480.54 μs

Comparison:
sql 5507.15
ecto 56.22 - 97.96x slower +17606.72 μs

Memory usage statistics:

Name average deviation median 99th %
sql 984.00 B ±0.02% 984 B 984 B
ecto 17952.31 B ±0.03% 17952 B 17952 B

Comparison:
sql 984 B
ecto 17952.31 B - 18.24x memory usage +16968.31 B

Reduction count statistics:

Name average deviation median 99th %
sql 0.0920 K ±0.03% 0.0920 K 0.0920 K
ecto 1.56 K ±0.30% 1.56 K 1.56 K

Comparison:
sql 0.0920 K
ecto 1.56 K - 16.91x reduction count +1.46 K
➜ sql git:(main) ✗

2 Likes

Is this really an apples-to-apples comparison?

 Enum.to_list(~SQL"SELECT 1")
   
 SQL.Repo.all(select(from("users"), [1]))

The stream statements are also quite different, such as the variable interpolation of 5_000_000 in the Ecto query but not in the SQL query.

I’m sure SQL is faster, but it’s hard to tell from these benchmarks.

It is, the benchmark is showcasing performance, expressiveness and ergonomics.

And it’s actually a bit fair to have some database work in the benchmark for Ecto. If you run the empty transaction then it becomes more obvious how big of a limitation Ecto’s pool and driver design is. Also Ecto cannot scale linearly and absolutely not diagonally, and that would be clear to most people that has worked at scale with Ecto and Postgres, where pgbouncer becomes essential. But in reality you will never need it if you lean into the BEAM.

People forget what an incredible platform we’re working on that let you write Elixir/Erlang that are at bare metal performance (C) with diagonal scaling. No other platform can do this, and as a community, I do not think we communicate this well.

If the two libraries are being exercised in different ways and then the numbers compared, it isn’t a meaningful benchmark. Pears to pears.

In particular, having Ecto do string interpolation and SQL do none in the cursor benchmark means they are measuring different things that aren’t really related to the performance of these libraries. In fact, SQL is being asked to do less work. How much less? Can’t tell from the benchmarks.

It’s also doesn’t help that the benchmarks are triivial, e.g. “fetch the literal value 1”. It’s similar to the http benchmarks that measure how long it takes to fetch a static file whose contents fit in a single network packet. :confused:

All I can muster from the benchmarks is that SQL is a lot better at queries that my applications will never produce. It might be faster/lighter at all queries, but these benchmarks don’t help one gauge that.

big of a limitation Ecto’s pool and driver design is

I still don’t understand how much of a limitation either of these two parts are. Is one more of a culprit than the other? If so, by how much? Is Ecto’s front-end query building implicated in this?

In a more perfect world, Ecto’s pooling and/or use of drivers would improve and separately from that sql (hopefully with a better name) could provide a different access point via it’s “literal sql” text→query approach.

And if that isn’t a real possibility, if it really is Ecto’s front-end, it would be great to understand why. I keep looking at these benchmarks and your posts about sql looking for this information so I can make an informed decision. I can’t be the only one. :slight_smile:

4 Likes

If you want to showcase client performance it’s essential to limit noise from the database, so simple queries that are constant time in the database is ideal as you then only have minimal db execution and latency, with would increase the amount of queries executed and reveal the limitations of the clients design, regardless of performance optimization made by the clients, yes the query planner is expensive, but less of a problem then the actual pool design and driver which limits linear and diagonal scaling. A big hint is the extensive use of GenServer.

I plan to talk more about this in talks and eventual a blog post, but there is still work in front of me I want to get done, code and benchmarks speak clearly, most of the modules are 500 LOC or less.

.. and obscure actual performance characteristics of it in use. Sure, if all one is interested in is how the application-side code behaves, fine … but users of database libraries care about how it performs with the database.

I know database benchmarking is not trivial (I’ve done it in the past myself), but it’s kind of important to test with actual workloads A fast client library that is a dog when actually retrieving data isn’t a great story.

(I haven’t mentioned in my comments how the types returned by SQL from the database are often nonsensical, such as 4-byte Erlang binaries (<<>>) for integers. It’s easy to be damn fast if one doesn’t care about correctness. But currently I assume those kinds of odd things are just because the library is still in development.)

That aside, would you acknowledge that the benchmark should not be having Ecto do string interpolations in queries when the same test written for sql is not? If you are trying to measure client-side code, making Ecto do more thing client-side can’t be helping.

That’s interesting, and not overly surprising. It does hint that we ought to be able to see significant gains without losing the features of Ecto’s DSL.

Can you share which aspects of the pool design are problematic? Or where the driver issues are, e.g. is it the implementations of the drivers (e.g. postgrex) or the use of them by Ecto?

I assume someone has measured these things?

1 Like

It sounds like you are one of the few lucky once there never had to deal with db_connection errors rather that be in test or production and then have to battle with pool_size or queue times. I’m glad you’re pointing out incomplete code paths, but keep in mind that benchmarks and test follow complete code.

The client is the absolute most important code piece as it is the deciding factor on how you can scale, BEAM scales diagonally so why shouldn’t your phoenix application?