Alternative to raw has_one/has_many relationships

LostKobrakai · November 20, 2016, 8:53am

I do a lot of php development in a cms/cmf called processwire for a living, while curiously looking into elixir/phoenix. ProcessWire’s entity modeling is based on a modeling strategy, which I found to be more flexible, than only the plain has_one/has_many/… relationships most frameworks seem to use/suggest.

Before explaining it in more details: My question is, if such a structure would be possible to use with ecto/phoenix and what you think might be the downsides/dealbreakers for it.

So modeling in processwire is not based on a single table per model, but models are rather a core set of infos about the model entity (id, name, template, created, modified) and a set of custom fields associated to the model (template). Each of the custom fields does have it’s own table, which in simple cases consists only of entity_id and data columns, but it certainly can hold more columns if needed by the type of field like sort for multi-value fields.

So a template for a blogpost, which does hold the custom fields of “title”, “tags” and “body”, would result in the following tables:

templates (has the info about which fields the blog post is associated with)
entities (base info like above)
field_title (one-to-one mapping of title to post)
field_tags (one-to-many mapping of tags to post)
field_body (one-to-one mapping of body to post)

Via the template information on each entity the data of the custom fields can be joined into the base information or being lazy loaded after the retrieval of just the core information about the entity.

The advantages of such a mapping to me are the following:

All entities ids are stored in a single table, which makes linking of entities of any kind super simple.
All entities can have has_one or has_many relationships to their fields, so multi-value fields are simple to do as well

Both together allow for structures, where an entity can link to a sorted list of differently typed entities without additional pivot tables or such things. For example in a portfolio website a project entity’s field aside_info could return a list like this [%Image{}, %VimeoEmbed{}, %Image{}, %Image{}].

A second feature of this modeling strategy is that data can easily be stored in a tree structure. By adding a parent_id column to the entities table everything can live in a global tree of all the entities in the system.

With every model entity being of the same base, one also does rather care about how to store data relevant to a specific type of field instead of creating a title (varchar(255)) on each model again. One does rather compose exisiting fields with fields unique to a specific model.

This might not be a perfect layout of what happing in processwire, but I tried to keep the explanation reasonably short. I curious to hear your thoughs about it.

sztosz · November 20, 2016, 10:14am

Doesn’t it make inefficient?

Can you show example database schema (dump) and models?

yurko · November 20, 2016, 10:36am

I think its doable as a generator, similar to phoenix.gen.model https://hexdocs.pm/phoenix/Mix.Tasks.Phoenix.Gen.Model.html

you could “steal” its logic and extend it to represent this structure

LostKobrakai · November 20, 2016, 10:43am

@sztosz
It surely is less efficient, but that’s the price for every abstraction one does use. On the other hand it’s a super flexible way of modeling data. But really in my line of work (low to medium traffic websites) the query speed was never an issue. On the other hand developer speed/efficiency is. I especially found, that being able to link any arbitrary data in a system without much fuss is super useful.

For a raw sql dump you can take a look at this one: https://raw.githubusercontent.com/ryancramerdesign/SkyscrapersProfile/master/site/install/install.sql
It’s a subset of the information available here (take a look in the backend, too): http://demo.processwire.com/

sztosz · November 20, 2016, 10:57am

I don’t like it. There’s too much generalization for me. But I don’t like software that let you build tailored to your needs solution by clicking things in “wizards”/“creators”, sooner or later you hit the wall and have to write custom code anyway.

But I also think it’s dooable.

Downsides? Too much abstraction and generalization. Complicated queries across many tables. Complicated mapping of business models to database entities. .

LostKobrakai · November 20, 2016, 6:11pm

Not sure where you see “much generalization” in this approach, when you can still have exactly the fields you need on a model. If you really have some special kind of data you cannot store in an existing type of those fields you can always add a custom one. For me it’s also less about the “clicking things in wizards” but rather a case of not having to do the hard stuff for the 95% that don’t need special attention, as long as I can still implement custom solutions for the 5% which really are custom.

Your point of the mapping of business model to database model on the other hand is totally valid. It does certainly require an abstraction over ecto/sql to query this kind of data structure to be useful.

sztosz · November 20, 2016, 7:04pm

I think I misunderstood whole concept at first. But now I don’t really see advantage of another abstraction layer with this

over just having simple relationships like this:
Project has_one aside_info (a foreign key in DB)
aside_info has_one or has_many as a foreign key to each of the tables images, vimeo_embeds.

And calling Project.aside_info_by_project_id(project_id) could simply return same list [%Image{}, %VimeoEmbed{}, %Image{}, %Image{}] where aside_info_by_project_id is really a SQL Query with simple joins and mapping to proper structs.

Anyway I really don’t understand why keep all entities basic info in entity table (and custom fields each in other table).

I don’t understand why would you wan’t to link random entities of totally not related types. Or I simply don’t understand whole sentence

In traditional 3NF having multi-value fields is super easy too. But we don’t have to sacrifice efficiency for single-value fields.

All in all I either still don’t understand the advantages of that alternative way or simply gives a layer on abstraction that obscures things instead of simplifying it. And I’m not shy to admit that i fear it’s the first one, lack of understanding