Best practice for keys in Ecto.Migration when using many to many polymorphic

blackham · May 5, 2021, 7:10pm

What is the most efficient way to create a many to many (N:M) relation? In my example I am fighting a polymorphic issue too. But I think my question works for any N:M relation.

Should I create a unique_index()?
Should I keep the timestamps()?
Is it a bad idea to add on_delete:?

I’m using MySQL/MariaDB. I don’t fully understand exactly how references() works in memory. (I don’t know if the DB will use the b-tree from the key with both items, or the reference b-tree, or what)

This is modified from the example found in: https://hexdocs.pm/ecto/polymorphic-associations-with-many-to-many.html

Original:

create table(:todo_list_items) do
  add :todo_item_id, references(:todo_items)
  add :todo_list_id, references(:todo_lists)
  timestamps()
end

create table(:project_items) do
  add :todo_item_id, references(:todo_items)
  add :project_id, references(:projects)
  timestamps()
end

What’s wrong with:

create table(:todo_list_items) do
  add :todo_item_id, references(:todo_items, on_delete: :delete_all)
  add :todo_list_id, references(:todo_lists, on_delete: :delete_all)
  create unique_index(:todo_list_items, [:todo_item_id, :todo_list_id])
end

create table(:project_items) do
  add :todo_item_id, references(:todo_items, on_delete: :delete_all)
  add :project_id, references(:projects, on_delete: :delete_all)
  create unique_index(:todo_list_items, [:todo_item_id, :project_id])
end

1. Should I create a unique_index()?
Back in the good 'ol days I would create a unique index with both items in the join table. A) So both the items were in memory meaning the DB could find the associated object IDs without having to touch disk. B) The unique part made sure my DB enforced the sanity despite my bad programming.

If I use references, do I lose the advantage of having the key IDs next to each other in memory? Should I keep it just to prevent duplicate pairs of IDs?

2. Should I keep the timestamps()?
If I don’t care when the relation was created, can I ditch this and save on space, and increase DB speed?

3. Is it a bad idea to add on_delete:?
If the item on the left (todo_list) or the item on the right (item) get deleted, I want to ensure the n:m table cleans it self up. So wouldn’t it be good to include this?

BONUS Question: My n:m join table is to handle the nasty polymorphic situation I’m stuck in. In truth this table is just a 1:1 map between tables. If the left side is deleted, or the right side is deleted, I don’t just want to delete the join table item. I want to delete everything on the left and right. (If the user is deleted, I want everything to go. I want projects to delete all the associated items. I want todo to then delete all the associated items via todos. ) I can hack it in the code, but it would be wicked awesome if the DB would would enforce this like it does with references().

paulstatezny · May 5, 2021, 8:30pm

I don’t mean to give a “non-answer”, but I think the answer to those questions is mostly “it depends”. It depends on the nature of what your app does and the data affected.

If you hold critical financial data, for example, it’s not sensible to cascade deletes.

If you have a reason (now or in the future) to want to know whenever a record in the join table was made, having inserted_at makes sense.

There are a lot of nuances that affect which decision to make. All of that said, ignoring the context I personally don’t see anytime technically wrong with the unique index, omitting the timestamps fields, or having the cascading deletes. I’ve done each of those myself before.

al2o3cr · May 5, 2021, 9:01pm

Consider swapping the order of these two terms, since multicolumn indexes can only be used for partial matches (for instance, with only todo_list_id fixed) for columns starting from the left end of the list.

Are you sure there’s a need for the many-to-many? It sounds like TodoList should has_many :projects and has_many :todo_items…

blackham · May 5, 2021, 10:52pm

I was just using the example tables used in the existing many to many polymorphic hexdocs. My real problem deals with loading various data sources, parsing them and saving them in the DB. All of the data sources have some location information in common. That is being stored in another table. I need to link that common location information with the various data sources (tables) which it is associated. The data sources are totally different and I wouldn’t qualify for 2NF if I tried to cram them into a single table.

So, I thought I’d give this polymorphic many to many thing a try. Vs, storing the table_name and table_id in the location table.

blackham · May 5, 2021, 10:52pm

Thanks for your comment. It’s good to know what I’m doing wont blow up on me.

f0rest8 · May 6, 2021, 12:41am

If you want to see another example of when you might use a many_to_many, then you can check out this guide in the Elixir Ecto docs: self-referencing many to many.

As @al2o3cr said, it looks like the has_many approach is often enough. Another post, I’m on mobile sorry, but another post has another commenter, @benwilson512, where he explains how he rarely if ever uses a many_to_many because you often don’t need to.

I’m really appreciating, as I build out Metamorphic, how Ecto is incredibly composable, by design. So you’ll often discover that it can do exactly what you need in any situation and it just becomes a matter of what you prefer, like @paulstatezny mentions.

Anyway, I know the self-referencing guide isn’t yet in the Hex docs, so I hope it helps spark some ideas for you.