SQL structure for many, different child entities

jdumont · April 24, 2019, 2:35pm

I’m back with another SQL / database structure question. I feel it’s related to polymorphism, STi and all those concepts, but seems inverted in my case (one parent type, many children; rather than many parent types to one child).

A simplified example — not the one I’m actually working with — would be the concept of a Document entity owning multiple different content types. Let’s say, Video, Image, Markdown, etc where each has a different structure and fields, but share an index field, so that all of the different content types for a given Document can be loaded and then displayed in order. Is this polymorphism?

Would something like an intermediate entity (Content for arguments sake) with an exclusive arc for the different content types and foreign key for the document be a suitable solution?

~~Given I’m using PostgreSQL and it has better support for querying embedded fields, would I be better off just dumping a JSON array into a field on the Document?~~

EDITED TO ADD: Just thinking through the problem a little more, simply embedding an array of maps/JSON won’t work, as within those maps I’ll want to reference records in another table.

jdumont · April 30, 2019, 1:22pm

I deleted this topic after deciding that my answers were already scattered around the forum in the various polymorphism and inheritance topics. Here I am a week later, still failing to find what I feel is an adequate solution to a relatively simple problem.

In a relational database (or any other table based one for that matter) how would you model the above — where a document/page/record/whatever is composed of many different types of content?

For example, the record that I’d want to pull from the DB would look something like this after whatever joins, etc were done:

%Document{
  name: "Example",
  id: 1,
  content: [
    %Video{
      index: 0,
      url: "something.com/ajsha",
      description: "all about the video",
      author_id: 3,  (%User{})
    },
   %Title{
     index: 1,
     text: "My title"
   },
   %Image{
     index: 2,
     url: "something.com/akshgdakj.jpg",
     caption: "Blah blah",
     uploader_id: 5, (%User{})
     photographer_id: 28, (%User{})
    },
    [...]
  ]
}

Thinking about it, this would have been equally relevant when I was putting together the schema for my CrossFit app recently. A solution to this could help there to.

Others have referred to this problem as a polymorphic has_many as opposed to the polymorphic belongs_to that is often talked about (and has solutions in the Ecto docs). That topic has been my main point of reference, although I’m not wedded to the idea of an Animal => Cat/Dog/Whatever relation as the original poster.

I’ve scratched my head, got my Google-fu on, read some books and even explored other means of storing the data and still can’t really work it out.

Help me!

dimitarvp · April 30, 2019, 1:37pm

Naively asking before thinking about it deeper – have you considered trying PostgreSQL’s table inheritance?

jdumont · April 30, 2019, 1:53pm

Truthfully I think I checked out Postgres’ table inheritance at some point but skipped over it after finding a reason it wouldn’t work/be ideal. I think it was something to do with Ecto not supporting it, but I’ve got no aversions to escaping into SQL if needs be.

I’ll take a proper look at it again and see whether it fits the bill or not.

A lot of the recent talk around here abut DB-less applications or making the DB an implementation detail has piqued my curiosity though. I’ve realised that when modelled without Ecto - just using Elixir structs - the problem is simpler and a lot of the code I’ve written thats only there to try and contort my data into tables just falls away.

I’m either still not “thinking in tables” or I genuinely have a use case where a different storage solution would be better (same goes for that CrossFit app). It’s kind of hard to know without more experience.

OvermindDL1 · April 30, 2019, 2:57pm

Yeah this seems like a classic use for postgresql inheritence. You can use foreign table links too. And yeah, Ecto could use some help in supporting it but it still ‘mostly’ works. ^.^;

The structs in structs method you mention would be like traditional foreign linked tables though, which is perfectly fine.

jdumont · April 30, 2019, 3:26pm

Of the two (inheritance and foreign table links) which would you opt for, and are there any “gotchas” that I should keep an eye out for?

(Trying to make up for my lack of experience here)

OvermindDL1 · April 30, 2019, 5:12pm

Honestly I’d just use foreign linked tables, easy to expand, easy to conditionally join (left_join’s), etc…

jdumont · May 2, 2019, 8:17am

So you’d have a intermediate table that would hold the document_id, index and then a column for each content type (video_id, image_id, title_id,markdown_id`, etc) where only one can be set at a time - a.k.a. an exclusive arc?

Would having an additional type column on this intermediate help with joining on the correct column, or would there be some SQL’y way to join on the non-null column that I don’t know about?

Sorry for all the questions. I just want to properly understand what I’m doing and why, so that I can apply it to other scenarios.

OvermindDL1 · May 2, 2019, 2:23pm

Well I’d do it the other way, I’d have a document_or_whatever table with rows that have columns that everything shares, then other tables like video, image, etc… of who’s primary keys is not an auto-incrementing integer but is rather a foreign table link back to the document_or_whatever. This will be highly efficient and infinitely expandable.

You can actually create a constraint that enforces that when one is created then no others exist, this constraint can be shared by them all so only one can exist at a time easily as well.

jdumont · May 2, 2019, 4:18pm

So the two rows across the two tables that make up a complete record in effect have the same primary key? In the main document_or_whatever table (probably go with content) I have a column called id that matches up to a column an id column (the primary) in the video, image, etc tables?

So when querying the data, I’d find the right row(s) in the `contents table and then left join on all the possible other tables where the primary keys match?

I think I understand…

OvermindDL1 · May 2, 2019, 4:31pm

Yep yep, the main table defines the ID to use and everything else’s ‘primary key’ just matches it with an enforced foreign link.

Yep yep! You can even extend such joins dynamically if your system becomes pluggable later or so too.

peerreynders · May 2, 2019, 5:15pm

2.3 Map Each Class To Its Own Table

OvermindDL1 · May 2, 2019, 5:28pm

Class? Like a type class or…? o.O

peerreynders · May 2, 2019, 5:42pm

As in OO class:

Media table is representative of an abstract base class and becomes the primary table holding the common fields and dictating the primary key.
Video, Image, Markdown are representative of concrete (derived) classes which map to tables which reference Media. They contain the records holding the class (type) specific fields.

jdumont · May 2, 2019, 6:14pm

I think this is part of where my confusion stems from. Modelling this (for a DB at least) becomes very OO very quickly.

@peerreynders are you suggesting that this pattern is inadvisable because of my “class” being split over two tables?

At what point is a struct or Ecto schema considered a class?

EDIT: Further question, putting aside the DB for a moment, what’s the most idiomatic way of modelling this data in Elixir?

peerreynders · May 2, 2019, 6:27pm

I was simply referencing “prior art”. The DB simply stores the relevant instance properties - so in this context there is no real difference between a class or a structure.

The challenge isn’t really modelling it in the DB - but determining the best way to exploit that physical data model within Ecto if you need to use changesets because ideally you just want a flat Video, Image, Markdown structure (if you are just querying there is no problem) rather than having to deal with a nested Video, Image, or Markdown struct inside a Media struct.

Ultimately 2.1 Map Hierarchy To A Single Table is the easiest to accomodate within Ecto. Right now I’m not familiar enough with all aspects of Ecto to find the best way to make Map Each Class To Its own Table work - if you need to use changesets.

jdumont · May 2, 2019, 7:31pm

I think this is where the problem shows itself. Regardless of how it’s best to model it within the database (when considering query speed, normalisation, etc or just plain “rightness” where it’s logical), modelling it as a single table with lots of different columns, many unused for a given media type is the easiest approach because its what Ecto will most easily allow for.

I think any other approach forces you to reconsider how Ecto fits. If you’re fighting it, or at the very least trying to work around the “one schema = one table” concept, then you are probably better off without it. That’s a big trade off to consider vs some empty columns in a table.

That brings me back to where I ended up with my CrossFit app (recursively nested Ecto structs - although that recursion brought its own complexity). Ho-hum, there’s rarely a single “right” answer is there?

OvermindDL1 · May 2, 2019, 7:33pm

Actually Ecto models it as links very well, that’s what the whole has_one/belongs_to and so forth is for.

jdumont · May 2, 2019, 7:38pm

But when loading all the media records for a document, I’d have a Media struct for each with a nested Image, Video, whatever struct.

Or have I misunderstood?

peerreynders · May 2, 2019, 7:55pm

Focusing on this statement in particular I’m not convinced you are dealing with a polymorphic entity in your domain.

This simply states that you need to show them in the same list - so it may be enough to have a Content table which identifies (document and) type (inferring a distinct table) and a key into that table.
How much information do you need to grab on this query?