Dynamic Embeds in Ecto

AndrewDryga · September 8, 2018, 3:16pm

Hey guys. I know this is an old topic, but as I write more and more complex application with Elixir and Ecto I feel like we really need a way to let developers use dynamic embeds.

Few words about our use case, we have a Transport schema which stores various common fields and settings of a transport. But depending on transport type (eg. Twillio and Facebook Messenger) settings can be very different and also there are DB constraints that should be in place for those settings.

We do work around this issue with an application logic which takes params for the embedded schema (which defined as :map type on parent changeset) and validates/casts them if embedded changeset is valid or properly adds errors to the parent otherwise. Here are some code:

A function that shows how dynamic changeset works in our case:

  defp cast_provider_settings(changeset, provider_field, provider_settings_field) do
    with {:ok, provider} <- fetch_change(changeset, provider_field),
         {:ok, settings} <- fetch_change(changeset, provider_settings_field),
         provider_settings_changeset = Provider.settings_changeset(provider, settings),
         {:ok, valid_settings} <- Validator.fetch_valid_attrs(provider_settings_changeset) do
      put_change(changeset, provider_settings_field, valid_settings)
    else
      :error ->
        changeset

      {:error, :not_found} ->
        changeset

      {:error, %{valid?: false} = settings_changeset} ->
        put_embedded_error(changeset, provider_settings_field, settings_changeset)
    end
  end

Here is how you can add an error to an embedded changeset defined as map:

  defp put_embedded_error(changeset, embed_field, embedded_changeset) do
    embedded_type =
      {:embed,
       %Ecto.Embedded{
         cardinality: :one,
         field: embed_field,
         on_cast: nil,
         on_replace: :raise,
         owner: %{},
         related: Transport,
         unique: true
       }}

    %{
      changeset
      | changes: Map.put(changeset.changes, embed_field, embedded_changeset),
        types: Map.put(changeset.types, embed_field, embedded_type),
        valid?: false
    }
  end

(Notice that you can’t override types and leave embedded changeset in Ecto Schema where :map type was defined because you would get a cast error. Ecto.Changeset does use pre-compiled type information when insert happens so overriding only helps when you use functions like traverse_errors/2.)

And even if you do that, there is a lot of issues that persist here. The main one right now for us is constraints - they are lost when embedded schema turned into a map and moving them manually to parent doesn’t make sense (error field would point to a wrong direction).

Other ways to hack around:

Define multiple schemas per database entity (or even combine that with PostgreSQL table inheritance). This one looks weird for me because when I fetch data back from DB I do want to see only one kind of schema. Data that I want to put there should be exactly what I get back.
Do not use dynamic embeds. This option looks poor because there is sooo many use cases where dynamic embed makes perfect sense.

As a very raw suggestion how we can deal with that:

We might add a :changeset type for Ecto.Schema.
It’s application responsibility to actually implement logic how embedded changeset gets there, on which fields it’s resolved, etc. (I don’t think that Ecto needs to add any kind of magic here.)
Repo operations should take care of changesets in :changeset fields in a same way as they would do with usual embedded schema.

OR

Make Ecto use type information from changeset (removing the calls to Schema.__*__ functions) which is not straightforward and would make changesets structs much bigger. (See this issue.)

wojtekmach · September 10, 2018, 11:52am

Can you talk a bit more about your use case; on the DB level is it e.g. transports table with a few columns, including e.g. provider_type (string) and provider_settings (json) columns?

And even if you do that, there is a lot of issues that persist here. The main one right now for us is constraints

What types of constraints, like CHECK constraints?

AndrewDryga · September 10, 2018, 3:39pm

@wojtekmach I guess migration would answer both questions. In short - yes, it’s 2 columns. Constraints can be very different, I can’t tell which we will use in future. Currently it’s unique index and CHECK’s.

create table(:transports, primary_key: false) do
  add(:id, :binary_id, primary_key: true)
  add(:title, :string, null: false)
  add(:provider, :string, null: false)
  add(:provider_settings, :map)
end

execute("""
CREATE INDEX transports_provider_settings_user_id_index ON transports
USING GIN ((provider_settings->'user_id'))
""")

execute("""
CREATE UNIQUE INDEX transports_facebook_provider_settings_page_id_index ON transports
USING btree (provider, (provider_settings->'page_id'))
WHERE provider = 'facebook_messenger'
""")

wojtekmach · September 11, 2018, 4:11pm

Hey @AndrewDryga, the migration is very helpful, thanks. The DB design looks good.

What do you think about validating provider settings with schemaless changesets and copying the errors to the parent? This would be similar to how constraint validations are handled, they’re used in the parent changeset and end up in parent changeset errors.

AndrewDryga · September 12, 2018, 10:33am

@wojtekmach this is definitely possible, we do as you said: validate dynamic embed with changeset (it’s not schemaless but it doesn’t matter) and put errors to the parent if any. But now we also need to copy constraints and in the view layer add a hack that would map constraint error to look like it occurred in the structure from provider_settings embed.

Mapping is required because we want error for a client to appear where it’s logically should be and point to a correct field, in case front-end maps that errors back. Correct me if I’m wrong, but changeset struct after constraint violation would point to a field in the embed, not to field in the parent struct.

The question is should we do something and make Ecto support dynamic embeds without a lot of hacking and mapping everything back and forth? Because resulting code is pretty complex, duplicated and error prone.

blatyo · September 12, 2018, 1:07pm

The core team tends to prefer building an extendable core and allowing the community to provide extensions. Is there something here that might prevent a library and require this to be in Ecto? What would Ecto support for dynamic embeds look like?

AndrewDryga · September 13, 2018, 10:33am

Unfortunately, I don’t know a way to write a library that would change the fact that you can’t use Ecto.Changeset you built by yourself with Ecto.Repo operations. If you have ideas - I’m all ears. Maybe provide your own Repo implementation, but for a library it would be very hard to keep it up to date.

To support dynamic embeds (as far as I know):

We should allow to use pretty much any Ecto.Changeset on embedded schemas (but I’m not sure how the syntax would look like there; syntax may be not required if we allow the lib to override Changeset type information and it would be actually used).
(maybe) We should support constraints on embedded schemas, or at least on dynamic embeds.
Ecto.Repo should thread dynamic embeds like any other embeds and in case of errors return them in proper structure (errors occurred in embed should be in embedded changeset).

drapermd · December 17, 2019, 4:49pm

@AndrewDryga Do you have an open source tree that you can share to solve this problem?

AndrewDryga · December 17, 2019, 5:05pm

We have code that we use internally but noting ready for open source yet. Without Ecto support it’s just hacks.

Adzz · April 24, 2020, 9:05am

Forgive me if I’m not understanding the problem correctly, but can you solve this problem with a custom ecto type?

Similar to the approach used here: https://medium.com/@ItizAdz/creating-a-has-one-of-association-in-ecto-with-ectomorph-3932adb996d9

Essentially the custom type decides how to build which struct based on the shape of the params it gets.

AndrewDryga · April 24, 2020, 4:50pm

Currently, this is not possible because a type implementation only has access to data inside one field, but our use case is when type is actually a separate field in a schema. If we can, somehow, make type to know about other field values - it would work.

Adzz · April 29, 2020, 9:09am

Could you put that field inside the params before you cast it? Then the type would have all the info it needed.

AndrewDryga · May 19, 2020, 10:42am

This would mean that we would have two fields duplicating type information, add constraints to DB to make sure they are equal and all this just to workaround Ecto limitations. I would rather move type inside payload (which is not ideal for our use cases).

We already have code that makes it possible to have dynamic embeds except you need to call a special “load” function every time schema is returned from DB. This is not very convenient due to Ecto preloads, but works.

mathieuprog · May 30, 2020, 4:12pm

@AndrewDryga could you check out the library I published for support for polymorphic embeds?
Would it answer your use case? If not, what would be lacking?

lukaszsamson · June 1, 2020, 6:11am

This is slightly offtopic as in my case I needed polymorphic embeds over JSON columns https://github.com/elixir-ecto/ecto/pull/3215#issuecomment-579217412

thojanssens1 · June 1, 2020, 6:42am

Were you working with a DB that has no JSONB support? Or why did you need to convert to JSON as seen in the code?

lukaszsamson · June 1, 2020, 7:02am

I was working with Postgres so JSONB was there. The problem was ecto mapping a single JSON column to one of several structs depending on some other column. Using a simple map serialized to JSON I would lose all ecto goodness like decimals, dates, times, casting, loading, validatins etc.

thojanssens1 · June 1, 2020, 9:44am

@lukaszsamson two things

I think that the Ecto.Type.embedded_dump and Ecto.Type.embedded_load in those Enum.reduce_while are unnecessary, and that represents a lot of code in your Ecto type.
Because you know the schema based on the ‘type’ field (def load(%{@type_field => module_string} = data)), so when loading the data from the DB, you can simply cast those values against the changeset of your schema. And the cast will convert the data to the right elixir data. Or did I miss something?
I don’t understand why you say off-topic as the library above for example does exactly what you are doing, i.e. picking dynamically an embedded struct based on some ‘type’ field, and store it in some field.

lukaszsamson · June 1, 2020, 10:03am

If I remember correctly the issue with that approach was that dumping my structs as ecto map used default Jason.Encoder protocol implementation for decimals, dates, etc and ecto was then unable to load them correctly (was loosing data or simply crashing)
You’re right, It’s more on-topic than I initially thought

thojanssens1 · June 1, 2020, 10:23am

I don’t see why Ecto could be unable to load data.

I tried playing with datetimes, dates, times, decimals, … to be stored in :map, and all seem to be dumped and loaded back to the right data without issues with Postgres without these encodings/decodings.

Also, do you know what other format can be given to embedded_dump(type, value, format) other than :json?