Redundant code in Ecto.Migration file - Phoenix Context Guide

miguelszerman · December 31, 2022, 4:13pm

In the context guide on Phoenix the following migration file is created for PostgresSQL:

defmodule Hello.Repo.Migrations.CreateProductCategories do
  use Ecto.Migration

  def change do
    create table(:product_categories, primary_key: false) do
      add :product_id, references(:products, on_delete: :delete_all)
      add :category_id, references(:categories, on_delete: :delete_all)
    end

    create index(:product_categories, [:product_id])
    create index(:product_categories, [:category_id])
    create unique_index(:product_categories, [:product_id, :category_id])
  end
end

Multicolumn indexes can also perform queries that involve only some of the columns, and thus the create index(...) for :product_id and :category_id isn’t needed since it’s included in the create unique_index(:product_categories, [:product_id, :category_id]).

What do you think?

zachallaun · December 31, 2022, 4:46pm

This does not seem to be the case for MySQL, see the notes towards the end of these docs: https://dev.mysql.com/doc/refman/8.0/en/multiple-column-indexes.html

Are there any instances in the generators where they generate different code depending on what database driver you’re using? If not, this doesn’t seem like a big enough win to add it, IMO.

miguelszerman · December 31, 2022, 4:49pm

Yep, I was considering the generator only for PostgresSQL. Thx!

I agree that the overhead of the two individuals create index(...) is pre-optimization. I would remove it from the guide since it adds complexity and can confuse beginners like me.

joey_the_snake · December 31, 2022, 6:47pm

The index for category_id is still needed. Using [:product_id, :category_id] to search for category_id when product_id is unrestricted is not efficient.

edit: to clarify, I’m talking about regular btree indices. the traversal is not efficient in that case.

miguelszerman · December 31, 2022, 7:13pm

What does unrestricted mean?

ChrisYammine · December 31, 2022, 8:05pm

The indexes on the individual columns are not redundant. Here’s an excerpt from the page you linked that explains why they are not equivalent:

A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns. The exact rule is that equality constraints on leading columns, plus any inequality constraints on the first column that does not have an equality constraint, will be used to limit the portion of the index that is scanned. Constraints on columns to the right of these columns are checked in the index, so they save visits to the table proper, but they do not reduce the portion of the index that has to be scanned. For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77 , the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they’d still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

Creating the individual column indexes is the right thing to do so that the query planner behaves in the “least surprising” way

joey_the_snake · December 31, 2022, 8:11pm

If you want to find specific category ids without limiting the values of product id.