Different ways to create indexes in Ecto

Hi, please. I have a question regarding migrations in Ecto. But maybe this more of a PostgreSQL question rather than an Elixir one.

I have a many-to-many relation between students and teachers. One student can have many teachers and the same teacher can have many students.

So, I’m basically defining my migration like this:

 def change do
    create table(:students_teachers, primary_key: false) do
      add :id, :binary_id, primary_key: true
      add :criteria, :integer, null: false

      add(:student_id, references(:students, on_delete: :delete_all, type: :binary_id),
        null: false
      )

      add(:teacher_id, references(:teachers, on_delete: :delete_all, type: :binary_id),
        null: false
      )
    end

    create index(:students_teachers, [:student_id])
    create index(:students_teachers, [:teacher_id])
  end

Ok, my doubt is about the last line of the code, the part where I define the indexes:

create index(:students_teachers, [:student_id])
create index(:students_teachers, [:teacher_id])

But if I do it this in a second way create index(:students_teachers, [:student_id, :teacher_id]) something is gonna change.

If I do a diff to the two \d students_teachers generated tables in PostgreSQL with the different types of migrations I will see that there are some changes.

This the output of my diff (which BTW only sees the difference in the index definition):

>     "students_teachers_student_id_index" btree (student_id)
>     "students_teachers_teacher_id_index" btree (teacher_id)
---
<     "students_teachers_student_id_teacher_id" btree (student_id, teacher_id)

So, my question is:
How do this two approaches change the way my data is related? What is the difference here? How do the teacher and the student are related according to the different ways I migrated the database?

Individual indexes on student_id and teacher_id fields

SELECT ..... WHERE student_id=1 => FAST SELECT
SELECT ..... WHERE teacher_id=1 => FAST SELECT

You would be telling Ecto to create a composite index

SELECT ..... WHERE student_id=1 => SLOW SELECT
SELECT ..... WHERE teacher_id=1 => SLOW SELECT
SELECT ..... WHERE student_id= AND teacher_id=1 => FAST SELECT

3 Likes

The first two queries are not that bad either, they’d still use indexes, especially the first one. Here’s some info about that:

1 Like

Yes, but than can also be solved if I pass the :id of the table (students_teachers_id). Then, by using Repo.preload I can have the student’s data and the teacher’s data, right?. With that said, I still have doubts on what on what is the best thing to do regarding on what type of index to use (:

So what you are saying is that if I use this create index(:students_teachers, [:student_id, :teacher_id]) I can have this:

SELECT ..... WHERE student_id=1 => FAST SELECT
SELECT ..... WHERE teacher_id=1 => FAST SELECT
SELECT ..... WHERE student_id= AND teacher_id=1 => FAST SELECT

?

Let’s focus on your SQL question first

Rule of thumb:

  • If you need to search and access the records individually by teacher_id OR student_id you should create two indexes individually. I assume this is 99% the use case for most tables.
  • If you ALWAYS fetch the records from that table by teacher_id AND student_id, you should create a composite index.

Nope, that was not my statement. My statement was SLOW SELECT on top 2.

For clarification, when I say SLOW it is relative to correct index scan. SQL storages are fast enough to make up for the missing time, it is not easy for us humans to grasp the gap of milliseconds, microseconds, nanoseconds.

But that nano, milliseconds add up to minutes, hours, days over millions, billions of queries, transactions.

Repo.preload is all related to your definition of relationships. As long as there is a foreign_key (relation) path for Ecto to figure out the relationship, it will load them for you, regardless of how slow or inefficient the relationship is.

1 Like

Ok, I get it. thanks

If you want to simplify the answer and make it slow / fast, then with your index (student_id, teacher_id) it would be

SELECT ..... WHERE student_id=1 => FAST SELECT
SELECT ..... WHERE teacher_id=1 => SLOW SELECT
SELECT ..... WHERE student_id= AND teacher_id=1 => FAST SELECT

See the link in my above comment for more on the topic of how slow the second query would be (not that slow).

From the Postgres docs (PostgreSQL: Documentation: 9.6: Multicolumn Indexes):

A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.