I am trying to parallelize a little bit of database workload with Elixir and Ecto. There is no data inconsistency expected, as each SQL query is expected to read the database and then write its own unique result. In other words, the queries do not depend on each other.
It looks like trying to run multiple independent connections with a properly configured connection pool might help. In fact, it might be more efficient than using
parallel query. Therefore, I tried to implement the following approach: it works better if you queue work
My objective is to use multiple connections to PostgreSQL so that it can run several queries simultaneously.
I have implemented a very simple script that for N amount of independent queries runs an
Enum.each that makes a query against the database.
Then I upgraded it so that for N amount of independent queries it splits them in batches of size M and runs Task.async for each batch.
def parallel_split(times) do if times > @split do Logger.info "Splitting" 1..@split |> Enum.map(fn _x -> Task.async(fn -> run_postgresql_query() end) end) |> Enum.map(&Task.await/1) parallel_split(times - @split) else 1..times |> Enum.map(fn _x -> Task.async(fn -> run_postgresql_query() end) end) |> Enum.map(&Task.await/1) end end
Task.async version seemed to run faster, because apparently many threads are used to issue and queue queries (Elixir/Ecto) and many threads are used by
PostgreSQL to run the queries. Here is how it looks when running:
Let’s say it takes around 84 seconds to process 50_000 queries.
However, when I running my script with
mix run --no-halt I noticed that
htop showed 8 busy threads for some time, but then only 1 thread stayed busy (around 90% of one CPU core load) for some time.
In fact, while running, the script uses all 8 cores for about 25 seconds and then the rest of the time it uses only 1.
Let us call this “Issue #02”.
I was curious to investigate, because it appeared to me that for some reason after correctly firing
async tasks against the Ecto’s queueing mechanism and getting the results back my script got stuck apparently waiting for some response from the database.
I enabled telemetry for Ecto, but looking at query/queue time for each individual query did not help much.
Then I figured that I could see the queue table with
:observer.start in the same
iex session before starting my main script.
Before the script starts, the
Elixir.DBConnection.ConnectionPool.Queue table has 16 records, which matches the
pool_size in the
Repo config of my app.
After the script starts, the
Elixir.DBConnection.ConnectionPool.Queue gets around M records (the batch size) which eventually go away (I guess this is the actual queuing process in action).
All is as expected, but the running time for the same 50_000 queries gets down to just 24 seconds!
I double-checked and ran the script again with
mix and without
Observer - same slow 84 seconds again!
And, I double-checked running it with
Observer - that “only one CPU thread working” issue (Issue #02) does not happen when using
Why a function that runs many database queries with a separate
Task.async call for each query, runs faster when launched from
iex and observed with