A Schrödinger's database adapter: Ecto&PostgreSQL perform faster when observed with Observer

Reversion1817 · May 17, 2023, 6:36pm

Objectives

I am trying to parallelize a little bit of database workload with Elixir and Ecto. There is no data inconsistency expected, as each SQL query is expected to read the database and then write its own unique result. In other words, the queries do not depend on each other.

It looks like trying to run multiple independent connections with a properly configured connection pool might help. In fact, it might be more efficient than using PosgreSQL’s parallel query. Therefore, I tried to implement the following approach: it works better if you queue work

My objective is to use multiple connections to PostgreSQL so that it can run several queries simultaneously.

Background

I have implemented a very simple script that for N amount of independent queries runs an Enum.each that makes a query against the database.

Then I upgraded it so that for N amount of independent queries it splits them in batches of size M and runs Task.async for each batch.

def parallel_split(times) do
  if times > @split do
    Logger.info "Splitting"
    1..@split
    |> Enum.map(fn _x -> Task.async(fn -> run_postgresql_query() end) end)
    |> Enum.map(&Task.await/1)
    parallel_split(times - @split)
  else
    1..times
    |> Enum.map(fn _x -> Task.async(fn -> run_postgresql_query() end) end)
    |> Enum.map(&Task.await/1)
  end
end

The Task.async version seemed to run faster, because apparently many threads are used to issue and queue queries (Elixir/Ecto) and many threads are used by PostgreSQL to run the queries. Here is how it looks when running:

htop screenshot

Let’s say it takes around 84 seconds to process 50_000 queries.

Problem

However, when I running my script with mix run --no-halt I noticed that htop showed 8 busy threads for some time, but then only 1 thread stayed busy (around 90% of one CPU core load) for some time.

htop screenshot

In fact, while running, the script uses all 8 cores for about 25 seconds and then the rest of the time it uses only 1.

Let us call this “Issue #02”.

I was curious to investigate, because it appeared to me that for some reason after correctly firing async tasks against the Ecto’s queueing mechanism and getting the results back my script got stuck apparently waiting for some response from the database.

I enabled telemetry for Ecto, but looking at query/queue time for each individual query did not help much.

Observer

Then I figured that I could see the queue table with Observer.

I use :observer.start in the same iex session before starting my main script.

Before the script starts, the Elixir.DBConnection.ConnectionPool.Queue table has 16 records, which matches the pool_size in the Repo config of my app.

After the script starts, the Elixir.DBConnection.ConnectionPool.Queue gets around M records (the batch size) which eventually go away (I guess this is the actual queuing process in action).

htop screenshot

All is as expected, but the running time for the same 50_000 queries gets down to just 24 seconds!

I double-checked and ran the script again with mix and without Observer - same slow 84 seconds again!

And, I double-checked running it with iex and Observer - that “only one CPU thread working” issue (Issue #02) does not happen when using iex and Observer.

Question

Why a function that runs many database queries with a separate Task.async call for each query, runs faster when launched from iex and observed with observer?

BartOtten · May 17, 2023, 8:17pm

Can you tell us:

The CPU architecture
Erlang/OTP version

There is another thread where another performance issue was hunted down and those parameters mattered. From what I recalled it had to do with an Apple M2 putting processes on slower cores. Having the GUI monitor open could prevent such thing to happen (???)

Observing this topic will resolve the issue faster.

Reversion1817 · May 18, 2023, 8:02am

BartOtten, thanks for your response.

CPU: 11th Gen Intel Core i7-1165G7 2.80 GHz, 4 Cores, 8 Logical Processors
Erlang: Erlang/OTP 25 [erts-13.0.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]
Elixir: IEx 1.13.4 (compiled with Erlang/OTP 25)
OS: Windows 11 Pro 22H2 OS Build 22621.1702, WSL2 Ubuntu 20.04

Having the GUI monitor open could prevent such thing to happen

Which GUI monitor do you mean?

Observing this topic will resolve the issue faster.

I am not sure I understand this, could you please elaborate?

BartOtten · May 18, 2023, 5:28pm

The GUI monitor = Observer. As observer is always busy observing, it might keep a process in “performance mode”.

The other was a joke about your database being faster when observed (Schrödinger). I applied this effect to this topic Rest assured: observing this topic 24/7 won’t make any difference in solving speed. Feel free to take a nap.

hst337 · May 18, 2023, 8:09pm

This might be related to

PostgreSQL planner
Virtual machine (WSL)
Kernel scheduler
Kernel network stack dispatch
General memory layout

Anyway, I’d suggest using Task.await_many and publishing your code, since it will definitely speak more precisely than your description

Reversion1817 · May 19, 2023, 8:06am

I will look into Task.away_many, meanwhile, I am simplifying the question:

Why a function that runs many database queries with a separate Task.async call for each query, runs faster when launched from iex?

I just ran it from iex without Observer, it still runs fast (faster than mix run --no-halt).

hst337 · May 19, 2023, 8:37am

You can share the code, and I will test it in my local setup. Otherwise, this can be related to anything

dimitarvp · May 19, 2023, 11:28am

That only depends on how many tasks will you run vs. the size of your DB pool.

Example: if your DB pool is at 10 but you spawn 30 tasks then 20 tasks will wait until the first 10 finish first. That can give you the idea that it’s “slower” when in fact it isn’t.

Reversion1817 · May 19, 2023, 3:24pm

@dimitarvp thank you for your response.

I think I understand how a DB pool works, but I guess this issue is not about the pool.

I run exactly the same code with exactly the same pool configuration and I get the following results

Run with mix run --no-halt: 84 seconds.
Run from iex -S mix: 25 seconds.

In both cases exactly the same pool is used.

dimitarvp · May 19, 2023, 4:07pm

I see. Well, as a start using Enum.map + Task.async + Task.await is a tad inefficient – though I wouldn’t think that amounts to more than a rounding error for the performance difference between executing the code in mix run --no-halt and iex -S mix but… let’s address one problem at a time.

I also don’t see a reason to use recursion in your scenario, by the way. That’s probably the main culprit because you end up processing batches serially.

Still, try this:

@maximum_parallel_db_connections 15

@doc "Process one batch serially within a single DB transaction"
def process_batch(items) do
  Repo.transaction(fn ->
    Enum.each(items, fn -> item
      do_stuff_with(item) # 👈 your processing code here
    end)
  end)
end

@doc "Process all items in parallel"
def parallel_process_items(items) do
  items
  |> Stream.chunk_every(100) # 👈 your desired batch size here
  |> Task_async_stream(&parallel_process_batch/1, max_concurrency: @maximum_parallel_db_connections)
end

So, IMO first change your code to something close to the above, post it, and also post if there are still differences between the both ways you are executing your code. That will put everyone in a better position to try and help you further.

hst337 · May 19, 2023, 7:52pm

How do you measure time?
What’s in your code?
How do you call these functions exactly in both variants?
How can I test the same?

Without answers to these questions, this discussion won’t bring any answer, it will be just dice rolling with extremely small chance to get a useful answer.

Reversion1817 · May 24, 2023, 8:39am

Sorry for a delayed response, I have been busy with some other tasks.

@hst337 is absolutely right, measuring time in a correct way as well as scrutinizing the same code is the way to test it; I hope to format and put it here soon.

Meanwhile, I implemented the code suggested by @dimitarvp, it is about 25% faster.

Also, I have discovered where Issue #02 happens: after Enum.map(&Task.await/1) finishes there is a noticeable delay (about 20 seconds). It only happens when called from within a barebones GenServer.

Reversion1817 · August 16, 2023, 7:25pm

@BartOtten @hst337 @dimitarvp thank you for your feedback.

It seems that after all probably the main issue was in not trying to overload/saturate the database correctly.

In the initial version of the code, I tried to launch N “parallel” queries against the database, where N could be as large as 50_000.

In the version suggested by @dimitarvp and slightly modified by me, I split the whole array of tasks into chunks. But the splitting factor is very low, 8 or 16.

That way I roughly get 8-16 “workers” that run thousands of queries each. The performance does not seem to degrade probably because each “worker” manages to re-use the same database connection from the database connection pool.

Thx!