In the context guide on Phoenix the following migration file is created for PostgresSQL:
defmodule Hello.Repo.Migrations.CreateProductCategories do
use Ecto.Migration
def change do
create table(:product_categories, primary_key: false) do
add :product_id, references(:products, on_delete: :delete_all)
add :category_id, references(:categories, on_delete: :delete_all)
end
create index(:product_categories, [:product_id])
create index(:product_categories, [:category_id])
create unique_index(:product_categories, [:product_id, :category_id])
end
end
Multicolumn indexes can also perform queries that involve only some of the columns, and thus the create index(...)
for :product_id and :category_id isn’t needed since it’s included in the create unique_index(:product_categories, [:product_id, :category_id])
.
What do you think?
This does not seem to be the case for MySQL, see the notes towards the end of these docs: https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html
Are there any instances in the generators where they generate different code depending on what database driver you’re using? If not, this doesn’t seem like a big enough win to add it, IMO.
1 Like
Yep, I was considering the generator only for PostgresSQL. Thx!
I agree that the overhead of the two individuals create index(...)
is pre-optimization. I would remove it from the guide since it adds complexity and can confuse beginners like me.
The index for category_id
is still needed. Using [:product_id, :category_id]
to search for category_id
when product_id
is unrestricted is not efficient.
edit: to clarify, I’m talking about regular btree indices. the traversal is not efficient in that case.
4 Likes
What does unrestricted mean?
The indexes on the individual columns are not redundant. Here’s an excerpt from the page you linked that explains why they are not equivalent:
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c)
and a query condition WHERE a = 5 AND b >= 42 AND c < 77
, the index would have to be scanned from the first entry with a
= 5 and b
= 42 up through the last entry with a
= 5. Index entries with c
>= 77 would be skipped, but they’d still have to be scanned through. This index could in principle be used for queries that have constraints on b
and/or c
with no constraint on a
— but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
Creating the individual column indexes is the right thing to do so that the query planner behaves in the “least surprising” way
2 Likes
If you want to find specific category ids without limiting the values of product id.