How to handle schemas polymorphism in Phoenix?

That’s great, thank you!

I am trying to implement what you describe with Ecto, but I have a hard time transferring the associations to actual code: how would you implement gorilla is an animal?

Also, using an id and a type does not seem to be recommended. From Ecto’s documentation:

The problem with this approach is that it breaks references in the database. You can’t use foreign keys and it is very inefficient, both in terms of query time and storage.

Using JSON would be great, I actually started the project using MongoDB. Thing is, I am going with Elm and GraphQL for my stack and I would loose a lot of benefits by not having typed schemas :slight_smile:

It is not a polymorphic association, it is a sti table. There is no link done like polymorphic Rails association on it. The link is on pet_owner_id, and the other is done with animal_id on each characteristics table.

Uh, does that even make sense in a SQL world? o.O
That sounds very OOP’y. You should instead favor a composition style as that is how SQL works. :slight_smile:

I understand!

I feel like what you’re suggesting is really close to the first example in the initial question:

defmodule Animal do
  use Ecto.Schema

  schema "animals" do
    belongs_to(:dog, Dog)
    belongs_to(:cat, Cat)
    belongs_to(:owner, PetOwner)
  end
end

defmodule Dog do
  use Ecto.Schema

  schema "dogs" do
  end
end

defmodule Cat do
  use Ecto.Schema

  schema "cats" do
  end
end

defmodule PetOwner do
  use Ecto.Schema

  schema "pet_owners" do
    has_one(:pet, Animal)
  end
end

What would be the difference between this and your approach?

In Your example, animals belongs to many things. So it would need to have dog_id, cat_id etc.

But in my example, each concrete characteristics table is holding an animal_id :slight_smile:

Might be why I couldn’t figure out how to implement it haha :joy:

Given the last few posts, I think the recommended way to do it in SQL would be:

create table animal(
  id serial primary key,
  dog_id integer references dog,
  cat_id integer references cat,
  check(
    (
      (dog_id is not null)::integer +
      (cat_id is not null)::integer +
    ) = 1
  )
);

create unique index on animal (dog_id) where dog_id is not null;
create unique index on animal (cat_id) where cat_id is not null;

See this post for reference: https://hashrocket.com/blog/posts/modeling-polymorphic-associations-in-a-relational-database

Does it make sense to you, or would you recommend another approach?

That is composition so that entirely makes sense. :slight_smile:

Just have foreign references on the extra data tables that point back to the main ‘animal’ table and it is easy to query/left_join across them all. I’d personally not put the foreign data tables on the animal table as that is less extensible unless the extra data table has a lot of duplication, in which case then yes it can be worth it (or a middle join table). :slight_smile:

Thank you for your input!

I’ve just read about this approach. Looks like it’s called Exclusive Arc, and it does not seem to be recommended (Database development mistakes made by application developers - Stack Overflow):

An exclusive arc is a common mistake where a table is created with two or more foreign keys where one and only one of them can be non-null. Big mistake. For one thing it becomes that much harder to maintain data integrity. After all, even with referential integrity, nothing is preventing two or more of these foreign keys from being set (complex check constraints notwithstanding).

Any opinion on this? I feel like the “rightness” of this approach depends on implementation detail, and I am not sure the implementation I gave as an example could be considered “right”.

1 Like

Just to reframe the discussion a little bit:

I am not looking for one way to implement inverse polymorphic associations, I already provided some implementations in the initial question. Instead, I’d like to get a sense of the pros and cons of each implementation, and eventually have a proposal for a recommended implementation with a list of reasons why it is recommended.

I feel like it’s a question that comes back frequently in Ecto’s community:

And so far, I haven’t been able to find a good answer to this question. I hope we can bring a definitive answer with this post :slight_smile:

EDIT: @kokolegorille suggestion to use an abstract table seemed like a good one. What would an actual implementation look like and what would be the advantages compared to other approaches?

2 Likes

Abstract tables are nice if you want to use the same schema but have it belong to different parents with an own table for each parent. There’s not really much to it in terms of implementation. Put “abstract table: element” in your schema’s source string and use the has_one :no_longer_abstract_element, {"real_table", Element}, foreign_key: :assoc_id notation to link it to the parent. So Element is the abstract schema, which is saved for this parent in the table real_table. A different parent would have a different table.

I feel like those questions often pop up because people expect ecto to solve the involved complexity for them (happened to me as well when starting with ecto), which it just doesn’t do.

1 Like

Hi @LostKobrakai,

Thank you very much for your explanation!

It’s indeed very nice and simple. The example in the documentation is straight forward.
I’m trying to do the opposite of what is described in the documentation though, that is, have the same “parent” belonging to different polymorphic tables - maybe we can call it has_many polymorphic association, as opposed to “belong_to polymorphic association”?

I am really open to a complex solution! I am just trying to:
1 - Have a description or example of this solution
2 - Know why this is a good solution

Hope it helps to understand my question better :slight_smile:

1 Like

I am going to try a different approach that I hope to be more constructive.

Here is an implementation of a has_many polymorphic association with Ecto:

Is this approach and the underlying database design recommended (there are some indications that it is not)?
And if not or if there is room for improvement, what would you change?

PS: Thanks again to everyone for the suggestion and lines of thought, it’s really appreciated!
I hope I am not to picky with the answers, maybe it’s one of these cases where there is no way better than the other and it depends on the situation. I’d be skeptical, but maybe.

1 Like

I think your new solution violates your previous this:

Stepping back, why would you want an Animal module (or struct type)? Why would you need it as a DB table?
Do you want to list all animals and create a list of all Owners? If so, that makes sense.
If you want to create owned animals like owner |> own_new_animal(attrs) no matter what type of animals, then why would you need concrete tables for each types of animals? Are there unique fields that each animals should have? Then it makes sense.
The problem is, being polymorphic and having unique things at the same time is not a easy thing to dodge. But it can be done by trading-off some aspects.

I’ll try to list pros and cons of each solutions (including some not yet appearing in this thread):

Single Table Inheritance

This is when you have a big table (Animal) and each concrete animal tables are a sub-division of that table. You won’t have multiple tables.

pros
– you will only have one table.
– fetching all animals are very easy
– it can be easily implemented in Ecto, because all concrete animal modules can have schemas different with each other but referring to the same big table.

cons
– if you want unique fields for each types of animals, you get NULL columns everywhere (but you don’t care when using the concrete modules’ schema: it doesn’t appear.)

Class Table Inheritance

This is when you have a base table (= Animal), and each concrete animal tables have a reference to the base table plus their unique fields. (Animal {id: 1, birth: "20180101", vaccinated: true} Animal {id: 2, birth: "20111225", vaccinated: false / Cat {animal_id: 1, colour: "brown"} Snake {animal_id: 2, length: 150})

pros
– listing all animals are quite easy, if you only need the basic informations
– no NULL columns

cons
– while getting the full information (of an animal) from the concrete modules are quite easy (you just JOIN :animal), getting the full info from the base side is not (you cannot know what to JOIN).
– This is unless you write some kind of logic similar to polymorphic tables in Rails (animal_type and animal_id), which are mentioned “not preferable” in the Ecto docs.

Concrete Table Inheritance

No tables for Animals will be created. Each concrete animal tables will have all basic informations + their unique informations. This will be good if you want to assure that all animals are animals, but you don’t work with Animals directly.
(I’d rather create a protocol named Animal rather that doing this)

pros
– no need to JOIN when fetching a concrete animal
– works good with Ecto’s abstract tables (Animal will be an abstract table)

cons
– you need to be careful at migrations not to break the entire inheritance
– unique assertions must be done completely on application logic
– especially ids must be unique across all animal tables, so using :binary_ids is preferred.
– any search using keys in Animal should go through all tables (ugh)

Abstract tables

This is described in the Ecto docs. (Sample repo) Using only this seems not very helpful on your usage, but if you are looking at STI or CTI, it may help the implementation.
When you create a Class Table Inheritance, each table refers to an Animal base table. Using abstract tables, you can split that into multiple base tables for each concrete animal tables, for example a cat table will refer tocat_base and snake to snake_base, where cat_base and snake_base will have same columns. Then we will create an abstract table animal, and when you cat |> Animal.add_base_animal_info() it will create a cat_base.

pros
– you can retrieve concrete animal from Animals

cons
– you need to JOIN every concrete *_base tables to do list_animals

I think this is somewhere between Class Table Inheritance and Concrete Table Inheritance.

Embedding data

Embedding can be done in Postgres and MongoDB etc. You can have a single animal table with a field that accepts a map (field :details, :map). Then you define many concrete animal modules which its schema refers to that animal table, having a embeds_one :details, CatDetails where you define a embed_schema for CatDetails. (This example is STI with Embeds)

pros
– tables will be very clean
– no NULLs
– listing all animals is easy
– storing data using the concrete animals’ schema will validate the shape of maps you pass

cons
– the shape of the maps inside :details cannot be verified in DB level
– searching for fields inside embeds may not be good (depends on what db you use)

Hope it helps;

33 Likes

The reason for this is that there’s no “the best” solution to this problem in SQL in general. There are several options, each with different trade-offs that @ndac_todoroki beautifully outlined. The best solution often depends not only on the data itself but also on the particular usage patterns.

This means there won’t be a single definitive answer to the question of “how to do this with Ecto”, since there’s no single definitive answer to the question of “how to do this with SQL”.

3 Likes

Wow, thank you so much for this detailed answer! It helped me a lot and I am sure it will help people who ask similar questions. Feel free to post this as an answer to my Stack Overflow question linked in the initial question, I’ll mark it as solved.

I am currently implementing showcases for each solution you describe, with your post and the implementations, I will be better equipped to take a decision.

I will post a link to the repository once I publish it :slight_smile:

3 Likes

FYI, I have this PR that I need to get back to and finish that relates to this topic: https://github.com/elixir-ecto/ecto/pull/2293

1 Like

I am a little late in here, but here are the principles I use for this problem set.

  • Single-table solutions are best.For your example, it’s OK to have some columns used for just cats, and others for just dogs.

  • Pattern matching makes coding “inheritance” ridiculously simple. Just match on a “kind” or “type” attribute.

  • Flat trees are your friend.

  • Pattern matching can work across multiple dimensions because you can match on multiple attributes.

When you think about problems in this way, the data structure falls out.

Think of your data in terms of structs, the tables match those exactly. Manage polymorphic behavior with pattern matching across as many dimensions as you need. Keep like things in the same struct wherever possible.

7 Likes

Sorry for bumping this thread, but this is actual to something I am struggling with atm. So your recommendation is to use one big table and manage subtypes on the application side, with different schemas using the same big table, but using different columns as needed? Apart from elegance and having clean tables, are there any drawbacks to this method? Does having many NULL columns have an impact on DB performance?

Nice bump! Three years ago!

I was responding specifically to the idea of modeling inheritance. I was not too clear.

I should have said "Don’t model inheritance. If you must, use single table inheritance. When that’s not enough, go back to rule 1.

If I am Noah and building an arc, I might should try to create a table for each major type of animal. I would be in a world of hurt. Think about weaving together the main entity in Elixir.

Instead, I should think about modeling traits of animals. Size, food, waste, etc around the parts of my system. This would put me on more familiar ground, and will split the table across more logical boundaries.

Hope this is more clear.

5 Likes

Another solution is to use embeds, if the polymorphic data may be stored in json. The PolymorphicEmbed library allows for dynamic embeds through a type field: