I have been debugging an Ecto performance issue with a slow Ecto Query against a PostgreSQL database, and I managed to narrow it down to an unnecessary pin (^) next to a constant value (module attribute). When the pin is present, the Ecto Query can often hang for many seconds or even time out. Without the pin, the query usually takes just a few milliseconds.
The query in question looks like this:
from(t in Table,
where:
t.tenant_id == ^user_tenant_id and
t.context_id == ^context_id and
t.status in ^@active_statuses,
order_by: [desc: t.id],
limit: 400,
select: t.id
)
where id
s are bigint IDs, status
is varchar(255)
, @active_statuses
is a list of ["open", "pending"]
, and there is an index on [:tenant_id, :context_id, :status]
The table is pretty big (4_000_000
+ rows). A single tenant may have hundreds of thousands of rows but the vast majority of rows (probably ~95%) have status that is not one of the active
ones.
To debug I ran explain
on the query in production with various parameters interchangeably (but keeping all three pins, including the statuses
one):
Ecto.Adapters.SQL.explain(DB.Repo, :all, query, analyze: true, verbose: true, buffers: true) |> IO.puts()
But I couldn’t reproduce the slowdowns this way, queries were using the correct index and had always the same plan. Executing the very same query immediately after with Repo.all(query)
would hang for seconds.
I rewrote the Ecto query to make it a raw SQL query, copying what Ecto was generating for it, deployed to production, and it made a huge difference - all the timeouts are just gone!
query =
"SELECT t.id FROM table as t WHERE t.tenant_id = $1 AND t.context_id = $2 AND (t.status)::text = ANY ('{open,pending}'::text[]) ORDER BY t.id DESC LIMIT 400;"
ids_result = Ecto.Adapters.SQL.query!(Repo, query, [user_tenant_id, context_id])
List.flatten(ids_result.rows)
I further investigated and managed to go back to an Ecto Query - the fix to make it fast turned out to be just changing ^@active_statuses
to @active_statuses
! But explain
still shows the same execution plan, no matter if we use ^
or not.
It seemed natural for us that queries against that table could be slow, but having them so slow that they were often timing out was problematic. We are aware of other optimization options like reorganizing indexes, partial indexes and partitioning, but we’d like to understand why the pin makes such a huge difference?
From Ecto documentation:
The pin operator instructs the query builder to use parameterized SQL queries protecting against SQL injection.
So the Ecto Query interface is flexible, and allows to mark constants as parameters as well - not just variables.
Another angle we looked at were prepared statements. Above raw SQL query will not use a prepared statement, so PostgreSQL will plan each execution separately. But the performance issues are gone after going back to an Ecto Query as well, just after removing the pin.
Does anyone know what might be happening here? I would appreciate any hints to further narrow down and understand this behavior.