How to handle schemas polymorphism in Phoenix?

tmattio · March 23, 2018, 7:22pm

I am going to try a different approach that I hope to be more constructive.

Here is an implementation of a has_many polymorphic association with Ecto:

Is this approach and the underlying database design recommended (there are some indications that it is not)?
And if not or if there is room for improvement, what would you change?

PS: Thanks again to everyone for the suggestion and lines of thought, it’s really appreciated!
I hope I am not to picky with the answers, maybe it’s one of these cases where there is no way better than the other and it depends on the situation. I’d be skeptical, but maybe.

ndac_todoroki · March 24, 2018, 6:00am

I think your new solution violates your previous this:

Stepping back, why would you want an Animal module (or struct type)? Why would you need it as a DB table?
Do you want to list all animals and create a list of all Owners? If so, that makes sense.
If you want to create owned animals like owner |> own_new_animal(attrs) no matter what type of animals, then why would you need concrete tables for each types of animals? Are there unique fields that each animals should have? Then it makes sense.
The problem is, being polymorphic and having unique things at the same time is not a easy thing to dodge. But it can be done by trading-off some aspects.

I’ll try to list pros and cons of each solutions (including some not yet appearing in this thread):

Single Table Inheritance

This is when you have a big table (Animal) and each concrete animal tables are a sub-division of that table. You won’t have multiple tables.

pros
– you will only have one table.
– fetching all animals are very easy
– it can be easily implemented in Ecto, because all concrete animal modules can have schemas different with each other but referring to the same big table.

cons
– if you want unique fields for each types of animals, you get NULL columns everywhere (but you don’t care when using the concrete modules’ schema: it doesn’t appear.)

Class Table Inheritance

This is when you have a base table (= Animal), and each concrete animal tables have a reference to the base table plus their unique fields. (Animal {id: 1, birth: "20180101", vaccinated: true} Animal {id: 2, birth: "20111225", vaccinated: false / Cat {animal_id: 1, colour: "brown"} Snake {animal_id: 2, length: 150})

pros
– listing all animals are quite easy, if you only need the basic informations
– no NULL columns

cons
– while getting the full information (of an animal) from the concrete modules are quite easy (you just JOIN :animal), getting the full info from the base side is not (you cannot know what to JOIN).
– This is unless you write some kind of logic similar to polymorphic tables in Rails (animal_type and animal_id), which are mentioned “not preferable” in the Ecto docs.

Concrete Table Inheritance

No tables for Animals will be created. Each concrete animal tables will have all basic informations + their unique informations. This will be good if you want to assure that all animals are animals, but you don’t work with Animals directly.
(I’d rather create a protocol named Animal rather that doing this)

pros
– no need to JOIN when fetching a concrete animal
– works good with Ecto’s abstract tables (Animal will be an abstract table)

cons
– you need to be careful at migrations not to break the entire inheritance
– unique assertions must be done completely on application logic
– especially ids must be unique across all animal tables, so using :binary_ids is preferred.
– any search using keys in Animal should go through all tables (ugh)

Abstract tables

This is described in the Ecto docs. (Sample repo) Using only this seems not very helpful on your usage, but if you are looking at STI or CTI, it may help the implementation.
When you create a Class Table Inheritance, each table refers to an Animal base table. Using abstract tables, you can split that into multiple base tables for each concrete animal tables, for example a cat table will refer tocat_base and snake to snake_base, where cat_base and snake_base will have same columns. Then we will create an abstract table animal, and when you cat |> Animal.add_base_animal_info() it will create a cat_base.

pros
– you can retrieve concrete animal from Animals

cons
– you need to JOIN every concrete *_base tables to do list_animals

I think this is somewhere between Class Table Inheritance and Concrete Table Inheritance.

Embedding data

Embedding can be done in Postgres and MongoDB etc. You can have a single animal table with a field that accepts a map (field :details, :map). Then you define many concrete animal modules which its schema refers to that animal table, having a embeds_one :details, CatDetails where you define a embed_schema for CatDetails. (This example is STI with Embeds)

pros
– tables will be very clean
– no NULLs
– listing all animals is easy
– storing data using the concrete animals’ schema will validate the shape of maps you pass

cons
– the shape of the maps inside :details cannot be verified in DB level
– searching for fields inside embeds may not be good (depends on what db you use)

Hope it helps;

michalmuskala · March 25, 2018, 5:19am

The reason for this is that there’s no “the best” solution to this problem in SQL in general. There are several options, each with different trade-offs that @ndac_todoroki beautifully outlined. The best solution often depends not only on the data itself but also on the particular usage patterns.

This means there won’t be a single definitive answer to the question of “how to do this with Ecto”, since there’s no single definitive answer to the question of “how to do this with SQL”.

tmattio · March 25, 2018, 11:22pm

Wow, thank you so much for this detailed answer! It helped me a lot and I am sure it will help people who ask similar questions. Feel free to post this as an answer to my Stack Overflow question linked in the initial question, I’ll mark it as solved.

I am currently implementing showcases for each solution you describe, with your post and the implementations, I will be better equipped to take a decision.

I will post a link to the repository once I publish it

mgwidmann · March 26, 2018, 2:17am

FYI, I have this PR that I need to get back to and finish that relates to this topic: https://github.com/elixir-ecto/ecto/pull/2293

redrapids · March 30, 2018, 1:11am

I am a little late in here, but here are the principles I use for this problem set.

Single-table solutions are best.For your example, it’s OK to have some columns used for just cats, and others for just dogs.
Pattern matching makes coding “inheritance” ridiculously simple. Just match on a “kind” or “type” attribute.
Flat trees are your friend.
Pattern matching can work across multiple dimensions because you can match on multiple attributes.

When you think about problems in this way, the data structure falls out.

Think of your data in terms of structs, the tables match those exactly. Manage polymorphic behavior with pattern matching across as many dimensions as you need. Keep like things in the same struct wherever possible.

DragosMocrii · February 4, 2021, 3:55am

Sorry for bumping this thread, but this is actual to something I am struggling with atm. So your recommendation is to use one big table and manage subtypes on the application side, with different schemas using the same big table, but using different columns as needed? Apart from elegance and having clean tables, are there any drawbacks to this method? Does having many NULL columns have an impact on DB performance?

redrapids · February 4, 2021, 1:56pm

Nice bump! Three years ago!

I was responding specifically to the idea of modeling inheritance. I was not too clear.

I should have said "Don’t model inheritance. If you must, use single table inheritance. When that’s not enough, go back to rule 1.

If I am Noah and building an arc, I might should try to create a table for each major type of animal. I would be in a world of hurt. Think about weaving together the main entity in Elixir.

Instead, I should think about modeling traits of animals. Size, food, waste, etc around the parts of my system. This would put me on more familiar ground, and will split the table across more logical boundaries.

Hope this is more clear.

mathieuprog · January 7, 2022, 1:35am

Another solution is to use embeds, if the polymorphic data may be stored in json. The PolymorphicEmbed library allows for dynamic embeds through a type field: