Ecto vs polymorphism: to embed or not to embed

i-n-g-m-a-r · July 29, 2018, 6:56pm

Hi everyone,

I am wondering if I’m on the right track here.

My Phoenix project doesn’t involve any html, so I only use pipe_through :api and channels.
There’s a context module (Accounts) in between the database layer and calling code (mostly channels).
Accounts consist of users (email, password), roles (name) and profiles (user, role, metadata).
Here’s the thing: I’d like to use a jsonb field to store profile metadata, like “first_name” for persons.
Also I’d like profiles to reference users as well as roles, so that a profile is a user/role combination with metadata.

I ended up implementing the following (abbreviated).

defmodule MyApp.Repo.Migrations.CreateProfiles do
  use Ecto.Migration
  def change do
    create table(:profiles) do
      add :user_id, references(:users, on_delete: :nothing), null: false
      add :role_id, references(:roles, on_delete: :nothing), null: false
      add :metadata, :jsonb, null: false, default: "{}"
      timestamps()
    end
    create unique_index(:profiles, [:user_id, :role_id], name: :user_profile)
  end
end

defmodule MyApp.Accounts.Profile do
  # ...
  schema "profiles" do
    field :user_id, :id
    field :role_id, :id
    field :metadata, :map
    timestamps()
  end
  # ...
end

defmodule MyApp.Accounts.PersonProfile do
  # ...
  @primary_key false
  embedded_schema do
    field :first_name, :string
    field :last_name, :string
  end
  # ...
end

defmodule MyApp.Accounts.PaymentProfile do
  # ...
  @primary_key false
  embedded_schema do
    field :payment_method, :string
  end
  # ...
end

defmodule MyApp.Accounts do
  # ...
  def create_person_profile(%User{id: user_id}, %Role{id: role_id, name: "person"}, %{} = params) do
    %PersonProfile{}
    |> PersonProfile.changeset(params)
    |> create_profile(user_id, role_id)
  end

  def create_payment_profile(%User{id: user_id}, %Role{id: role_id, name: "payment"}, %{} = params) do
    %PaymentProfile{}
    |> PaymentProfile.changeset(params)
    |> create_profile(user_id, role_id)
  end

  defp get_map(%Ecto.Changeset{valid?: true} = profile) do
    profile
    |> Ecto.Changeset.apply_changes
    |> Map.from_struct
  end

  defp create_profile(%Ecto.Changeset{valid?: true} = profile, user_id, role_id) do
    %Profile{user_id: user_id, role_id: role_id}
    |> Profile.changeset(%{"metadata" => get_map(profile)})
    |> Repo.insert
  end

  defp create_profile(%Ecto.Changeset{} = profile, _, _), do: {:error, profile}
  # ...
end

So my calling code would call Accounts.create_person_profile/3 in order to create a person profile.
The metadata would be validated by Accounts.PersonProfile.changeset/2 using the embedded_schema.
Private function Accounts.create_profile/3 would then persist the actual profile where metadata is just a map.

I am avoiding Ecto.Schema.embeds_one/3 because it would break polymorphism.

Should this implementation be considered bad practise or is just fine to use Ecto like this.

Cheers,

Ingmar

blatyo · July 29, 2018, 7:10pm

Polymorphism is useful when one thing can be substituted for another. In your example, it doesn’t look to me like that is the case with a payment profile and a person profile. It appears as if you’re putting them in the same table because they happen to share some of the same fields. If that is the case, I would probably model this as two separate tables with no shared code. As they grow, they’re likely to have divergent behaviors.

i-n-g-m-a-r · July 29, 2018, 9:17pm

thanks for your response.
in terms of behaviour you are right,
the various profiles don’t substitute one another.
from a data persistance perspective they do,
I can treat each of them the same way.
Also I make sure that a user doesn’t have more than one of each profile type.
And it’s very convenient to retrieve all of them in one go.

My main concern is how to deal with models that are in part well structured with foreign key constraints and so on but also contain unstructured data (nosql, jsonb).
The example I posted shows my attempt to have the best of both worlds using Ecto.

dimitarvp · August 5, 2018, 10:46am

You should not use anything because you want it – you should use it because it fits the job. I cannot find a reason to use polymorphism in your case.

In your example it looks like that you need one of the following:

Separate related records, like PersonalProfile and PaymentProfile and others which you should add with time.
Separate top-level keys in a JSONB column, like so:

%{
  "personal_profile": ...
  "payment_profile": ...
}

…and again, whatever other objects you might need in the future.

If having common fields worries you about data duplication then why not just put them in the parent object?

i-n-g-m-a-r · August 5, 2018, 1:04pm

hi Dimitar,

tnx for your comments, I appreciate your feedback.
also I agree with you, any particular solution should fit the job.

at this point the question seems to be about whether to use polymorphism at all versus how to do it with Ecto.
I would like to use polymorphism in order to have a solid and coherent interface to handle the same kind of records.
as you suggested I could have just a single record (user) to store all of the metadata related to roles.
I guess maybe it’s a matter of taste, but I’d rather have many specialized records than one big record of everything.
to me it makes sense to store metadata about a relation in the relation (using jsonb, kind of independently).

I’m planning on having lots of roles, most of them are presently unknown
adding roles should be fairly trivial; having to migrate the database every time would not be
also adding metadata fields to profiles should be trivial and easy to validate
selecting on roles is more important than selecting on users or profile metadata
users and roles will belong to other things like groups and ratings
roles should be very flexible, for example “dog owner” could be a role and so could “wikipedia contributor”

so in the end I’m trying to avoid modelling everything and instead have a solid core that is very flexible.
I’m trying to use Ecto to guard the data that is used throughout the application.

dimitarvp · August 5, 2018, 1:58pm

But again, I do not see why polymorphism is better in your case – I don’t think it is. You haven’t addressed that.

Also if you want to select on roles then I’d say having them as separate table and records to whom Users point (via has_many) is much more practical and likely faster. But I haven’t measured PostgreSQL’s performance on selecting inside a JSONB field, mind you.

Do not overengineer. If you do not know 100% of the requirements beforehand, just go with what will get the job done now. Far too many times I’ve planned and coded for a future that never came.

i-n-g-m-a-r · August 6, 2018, 9:58am

if I would store metadata in the users table as you suggested before,
I would not have to change the current roles table and I could use has_many to point users to roles.
so I would be “merging” users and profiles and I would not need to migrate every time.
that would work just fine.

at this point my user module only knows how to authorize users (changeset > cast > validate > hash password).
I use Guardian to rehydrate a user resource from a JWT.
if I don’t (pre)load all of the metadata at that point (unnecessarily) the %User{} would be incomplete.

so the profiles table provides “lazy loading” as well as structured metadata storage.

of course I agree with you not to overengineer, in my mind I’m not.

dimitarvp · August 6, 2018, 10:00am

What you say is fair, but have you checked about the quality and speed of querying by JSONB columns? If you are satisfied with it, then go ahead with your original idea – only modify it slightly to use different keys inside the JSONB as I pointed out above.

i-n-g-m-a-r · August 6, 2018, 10:58am

I have worked quite a lot with jsonb columns, though not in combination with Ecto.
provided an index is used (gin/jsonb_path_ops) querying performance is pretty decent.
if I would want to query on metadata only I would not use jsonb though.

jsonb works very well together with “regular” selects, something like:

where role = 'dog owner' and metadata @> '{"breed":"Siberian Husky"}'::jsonb

when having multiple breeds is allowed, indexed querying becomes more complex.

thx again for your time and feedback.