fastindian84
Ecto doesn't use index when performing a query but should
Hello Community, I have a question that I can’t figure out why Ecto query work this way.
So I have a next table:
Table "public.task_instances"
Column | Type | Collation | Nullable | Default
------------------+--------------------------------+-----------+----------+-----------------------------------
id | uuid | | not null |
owner_id | uuid | | not null |
Indexes:
"task_instances_pkey" PRIMARY KEY, btree (id)
"task_instances_owner_id_index" btree (owner_id)
And this is what shows postresql explain:
panda_dev=# panda_dev=# explain (COSTS TRUE, FORMAT YAML, ANALYZE TRUE) select * from task_instances where owner_id = '4ae562de-f353-43ba-a090-31d4787f1fbe'::uuid;
QUERY PLAN
---------------------------------------------------------------------------------
- Plan: +
Node Type: "Bitmap Heap Scan" +
Parallel Aware: false +
Async Capable: false +
Relation Name: "task_instances" +
Alias: "task_instances" +
Startup Cost: 18.67 +
Total Cost: 69.98 +
Plan Rows: 825 +
Plan Width: 225 +
Actual Startup Time: 0.246 +
Actual Total Time: 1.175 +
Actual Rows: 825 +
Actual Loops: 1 +
Recheck Cond: "(owner_id = '4ae562de-f353-43ba-a090-31d4787f1fbe'::uuid)" +
Rows Removed by Index Recheck: 0 +
Exact Heap Blocks: 41 +
Lossy Heap Blocks: 0 +
Plans: +
- Node Type: "Bitmap Index Scan" +
Parent Relationship: "Outer" +
Parallel Aware: false +
Async Capable: false +
Index Name: "task_instances_owner_id_index" +
Startup Cost: 0.00 +
Total Cost: 18.46 +
Plan Rows: 825 +
Plan Width: 0 +
Actual Startup Time: 0.210 +
Actual Total Time: 0.210 +
Actual Rows: 825 +
Actual Loops: 1 +
Index Cond: "(owner_id = '4ae562de-f353-43ba-a090-31d4787f1fbe'::uuid)"+
Planning Time: 0.988 +
Triggers: +
Execution Time: 1.266
(1 row)
As you may see it plans to use task_instances_owner_id_index .
But when I do the same with Ecto:
Repo.explain(:all, Queries.Instances.by_owner(i), format: :yaml, analyze: true, summary: true, costs: true)
EXPLAIN ( ANALYZE TRUE, FORMAT YAML, SUMMARY TRUE, COSTS TRUE ) SELECT t0."id", t0."owner_id" FROM "task_instances" AS t0 WHERE (t0."owner_id" = $1) [<<74, 229, 98, 222, 243, 83, 67, 186, 160, 144, 49, 212, 120, 127, 31, 190>>]
- Plan:
Node Type: "Seq Scan"
Parallel Aware: false
Async Capable: false
Relation Name: "task_instances"
Alias: "t0"
Startup Cost: 0.00
Total Cost: 58.88
Plan Rows: 825
Plan Width: 225
Actual Startup Time: 0.011
Actual Total Time: 0.446
Actual Rows: 825
Actual Loops: 1
Filter: "(owner_id = '4ae562de-f353-43ba-a090-31d4787f1fbe'::uuid)"
Rows Removed by Filter: 605
Planning Time: 1.093
Triggers:
Execution Time: 0.554
It doesn’t plan to use index at all. I believe it is becauase of different format, you may notice that Ecto uses binary ID when query though I am not sure.
So my question if anybody encounter such issue and if yes how did you fix it?
Thx
Marked As Solved
kip
This isn’t a consequence of Ecto, generally. And I hope you noticed that the performance (planning+execution) of the Ecto version is actually faster than the psql version.
The differences are that by default, Postgrex (the Postgres driver) uses prepared queries. psql does not. I recommend reading the Postgres docs that explain the difference between generic plans and custom plans for prepared queries which may explain why you see a different plan to the psql version.
Since the sequential scan plan was in fact faster than the indexed scan, this suggests the amount of data in the table is quite small and may well fit into a single database page. Therefore no need to read both index and table pages and hence largely similar performance. I would retest again on a much larger data set and only worry about the plan if you see anomalous performance.
Also Liked
kip
You can execute:
set enable_seqscan = off;
To discourage the planner from doing sequential scans. But I think in the case of the original post this is very premature optimisation.








