Idiomatic way to re-validate has_many children on a parent state transition with Ecto?

Hey, I know this has been discussed ad nauseum, but I couldn’t find a definitive answer or at least some consensus around this, so let me add it to the pile since I couldn’t find a discussion in this depth (pun intended). Given this example:

  defmodule Forms.Form do
    use Ecto.Schema
    import Ecto.Changeset

    schema "forms" do
      field :status, Ecto.Enum, values: [:draft, :published], default: :draft
      field :title, :string
      has_many :questions, Forms.Question, on_replace: :delete
    end
  end

  defmodule Forms.Question do
    use Ecto.Schema
    import Ecto.Changeset

    schema "questions" do
      field :prompt, :string
      field :type, Ecto.Enum, values: [:short_text, :multiple_choice]
      field :choices, {:array, :string}, default: []
      belongs_to :form, Forms.Form
    end
  end

A Question only exists inside a Form. There’s no concept of a free-floating question, no other parent can claim it, and you’d never query questions independently. TLDR: assume this is the perfect shape for the schema hahah.

I want the form to be flexible while in draft, but strict on publish:

  • While :draft, questions can be half-finished. prompt may be blank, and a :multiple_choice question doesn’t need choices yet.
  • To move to :published, the form must have at least one question, every question must have a prompt, and every :multiple_choice question must have at least 2 entries in choices.

So the same has_many :questions association obeys two different validations depending on the parent’s status, and the draft to published transition has to re-validate the children against the stricter regime before flipping the state.

To make matters worse, let’s say the publish_form API supports patching the form before publishing. Sometimes the questions will contain changes, but sometimes it’s untouched. And we’re very concerned with performance, so a round trip to the db just to please the lib is not acceptable by our coding standards.

What’s the idiomatic way to write this publish_changeset? The best I got is:

def publish_changeset(%__MODULE__{} = form, attrs) do
  form
  |> cast([:title])
  |> cast_assoc(:questions, with: &Forms.Question.complete_changeset/2)
  |> validate_required([:title])
  |> validate_length(:questions, min: 1)
  |> revalidate_questions()
end

defp revalidate_questions(changeset) do
  if get_change(changeset, :questions) do
    changeset
  else
    case get_field(changeset, :questions) do 
      nil -> 
        changeset
        |> add_error(:questions, "is required")
        |> add_error(:status, "questions is required")

      questions ->
        # this part is clunky: I have to `put_assoc` so the errors are included
        # in the changeset, it's cumbersome to check 
        all_complete? = Enum.all?(questions, &Forms.Question.complete?(&1))

        if all_complete? do
          changeset
        else
          add_error(changeset, :status, "incomplete questions")
        end
    end
  end
end

But this is clunky to me. For one, Forms.Question.complete? is not a changeset, so we won’t have per-question error. If I turn this into a changeset it feels wrong, as the questions are not really changing? There’s also the weird double checking - one branch for if there are changes being passed and another one if the questions are not changing during posting - even though the rules are the same.

Is this the best I can do or is there a cleaner way?

I’d do it like this: in transaction I update questions given on incoming changes, then query questions for the form, then check if questions are valid. If so, change form and commit, otherwise rollback. That’s it

That was my first thought, but it doesn’t make it less awkward:

  1. It adds a round-trip to the db - if performance is critical, this is a deal breaker
  2. “check if questions are valid”: how would you do that? Use a new changeset with force_changes: true? Hand-crafted validation with {:error, :invalid_questions} return (instead of per-question error)? It feels like neither are clean choices

Maybe I’m being pedantic here, but hey - the “best practices” tag isn’t the place for being 100% pragmatic eh? :sweat_smile:

I don’t understand why a round trip to the database is needed at all. If it’s just to check the data integrity when publishing then the database should be your last line of defence. You can have more than one functions producing changesets. While in draft, use draft_changeset and when publishing - publish_changeset.

It is inevitable. You need to read questions in order to check if they are valid for published form. This read->check->write sequence must be atomic and isolated too, so your only option is transaction.

Ahhh, I guess you’re right. When I read your reply I thought of going from read->write(questions)->check->write to read->write(questions)->check->write, but I guess you could just do write(questions)->check->write since the write(questions) would already return the data. Got it.

Yeah, sorry. I was trying to pre-answer some of the approaches I’ve seen, but I guess this doesn’t make much sense here anyway and ended up being a distraction :slight_smile:

My question is really how to build the publish_changeset

I’d treat the draft as the input or “changes” to a new record tbh. To me that’s the more reasonable architecture over a random boolean flag.

Also keep Ecto.Changeset.change/2, Ecto.Changset.put_assoc/embed in mind for applying changes programatically as opposed to as weakly typed external inputs, which need to be casted.

Yeah, I think in this example it’s the best answer - split Form into FormDraft and Form, and have FormDraft.to_form_params that passes it to the Form.publish_changeset.

But there’s also cases where this wouldn’t be possible. Imagine if state had a higher cardinality, to the point that having different subtypes would be too much. Or just (as is my case) that it’s an existing code base and refactoring it like that would be too costly - maybe there are queries that need to return both, or there’s a lot of code that depends on this already. Is there a solution for the `publish_changeset` that’s more elegant or am I stuck with the workarounds since the modeling is not the best?

You do not need to tightly couple your changesets to forms. You can map between the changeset of your form and the changeset you use in the backend to talk to your database.

I’m not sure if I follow. Where do you see this coupling with forms? In my example I’m only considering the back-end. So I’m considering there’s just a Forms context module that exposes a create_form(draft_attrs) :: Form and publish_form(form_id, attrs) :: Form - it’s agnostic to how it’s interacting with any front-end/api

Ah. I was thinking the Form module would be the frontend not the struct of the changeset. Generally I’d argue that no matter how you scope your functions and modules you should always be able to create a changeset, where all the to be revalidated parts are on the changes side of the changeset(s) and not on the data side. If you do not start from blank data you might need to use force_change - but you can start from a blank schema struct and move all data over. There’s no object instances to loose or something. For as long as you keep ids the same they’ll be written/updated/kept as the same record.

Essentially on publish you need to split up the data existing on your records into what is considered already validated data / to be retained as is and what should be considered a change in need for validation - hence me suggesting to model that task explicitly.

The cleanest solution is probably removing this:

I used to work on an EHR rails app with this exact issue. It’s much easier to train non-technical users to click a validate button first before they could publish a form that would be used in the system. It put a fear in them that they could not change the form anymore once it was published and used at least once(which was the case).

This way, you have your publish changeset that validates the current state of the form separate from the edit form changeset and both code paths are simpler.

@caioaao could you please sum up the advice from this thread in a code snippet that shows how the code from the question could be improved? I feel like some of the discussion got over my head…

Why do I feel like I’m an LLM being prompted? lol.

The winner advice is to include everything that needs to be revalidated as changes in the changeset. Something like this (untested):

defmodule Forms.Form do
  use Ecto.Schema
  import Ecto.Changeset

  schema "forms" do
    field :status, Ecto.Enum, values: [:draft, :published], default: :draft
    field :title, :string
    has_many :questions, Forms.Question, on_replace: :delete
  end

  @doc """
  More relaxed changeset. While in draft we allow things to be missing/invalid. Validation will happen in publishing
  """
  def draft_changeset(form \\ %__MODULE__{}, attrs) do
    form
    |> cast(attrs, [:title])
    |> cast_assoc(:questions)
    |> put_change(:status, :draft)
  end

  @doc """
  Publish changeset validates everything again. Only complete forms can be published
  """
  def publish_changeset(form \\ %__MODULE__{}, attrs) do
    form
    |> cast(attrs, [:title], force_changes: true) # Even if `title` is not present in `attrs`, it will be considered a change here
    |> cast_assoc(:questions, with: &Forms.Question.publish_changeset/2)
    |> put_change(:status, :published)
    |> validate_required([:title])
    |> validate_length(:questions, min: 1)
  end
end

defmodule Forms.Question do
  use Ecto.Schema
  import Ecto.Changeset

  schema "questions" do
    field :prompt, :string
    field :type, Ecto.Enum, values: [:short_text, :multiple_choice]
    field :choices, {:array, :string}, default: []
    belongs_to :form, Forms.Form
  end

  @doc """
  More relaxed changeset. While in draft we allow things to be missing/invalid. Validation will happen in publishing
  """
  def draft_changeset(question \\ %__MODULE__{}, attrs) do
    question
    |> cast(attrs, [:prompt, :type, :choices])
  end

  def publish_changeset(question \\ %__MODULE__{}, attrs) do
    question
    |> cast(attrs, [:prompt, :type, :choices], force_changes: true) # same thing here: force changes to make sure everything is validated even if didn't change now
    |> validate_required([:prompt, :type])
    |> validate_choices()
  end
end

The other idea (a more verbose one) would be to split the schemas into a DraftForm/DraftQuestion and a PublishedForm/PublishedQuestion. They can both point to the same table, but have different requirements in their changesets. Also the Draft variants would have a to_published_attrs that returns the map to be fed into the published variant’s changeset, like so:

defmodule Forms.DraftForm do
  use Ecto.Schema
  import Ecto.Changeset

  schema "forms" do
    field :status, Ecto.Enum, values: [:draft, :published], default: :draft
    field :title, :string
    has_many :questions, Forms.DraftQuestion, foreign_key: :form_id, on_replace: :delete
  end

  @doc """
  Relaxed changeset. While in draft we allow things to be missing/invalid.
  """
  def changeset(form \\ %__MODULE__{}, attrs) do
    form
    |> cast(attrs, [:title])
    |> cast_assoc(:questions)
    |> put_change(:status, :draft)
  end

  @doc """
  Converts a draft into the attrs map expected by `Forms.PublishedForm.changeset/2`.
  Everything goes through the published changeset, so all fields get validated.
  """
  def to_published_attrs(%__MODULE__{} = form) do
    %{
      "title" => form.title,
      "questions" => Enum.map(form.questions, &Forms.DraftQuestion.to_published_attrs/1)
    }
  end
end

defmodule Forms.DraftQuestion do
  use Ecto.Schema
  import Ecto.Changeset

  schema "questions" do
    field :prompt, :string
    field :type, Ecto.Enum, values: [:short_text, :multiple_choice]
    field :choices, {:array, :string}, default: []
    belongs_to :form, Forms.DraftForm
  end

  def changeset(question \\ %__MODULE__{}, attrs) do
    question
    |> cast(attrs, [:prompt, :type, :choices])
  end

  def to_published_attrs(%__MODULE__{} = question) do
    %{
      "prompt" => question.prompt,
      "type" => question.type,
      "choices" => question.choices
    }
  end
end

defmodule Forms.PublishedForm do
  use Ecto.Schema
  import Ecto.Changeset

  schema "forms" do
    field :status, Ecto.Enum, values: [:draft, :published], default: :draft
    field :title, :string
    has_many :questions, Forms.PublishedQuestion, foreign_key: :form_id, on_replace: :delete
  end

  @doc """
  Strict changeset. Only complete forms can be published.
  Feed it `Forms.DraftForm.to_published_attrs/1` output - since every field
  arrives as an incoming change, everything is (re)validated.
  """
  def changeset(form \\ %__MODULE__{}, attrs) do
    form
    |> cast(attrs, [:title])
    |> cast_assoc(:questions)
    |> put_change(:status, :published)
    |> validate_required([:title])
    |> validate_length(:questions, min: 1)
  end
end

defmodule Forms.PublishedQuestion do
  use Ecto.Schema
  import Ecto.Changeset

  schema "questions" do
    field :prompt, :string
    field :type, Ecto.Enum, values: [:short_text, :multiple_choice]
    field :choices, {:array, :string}, default: []
    belongs_to :form, Forms.PublishedForm
  end

  def changeset(question \\ %__MODULE__{}, attrs) do
    question
    |> cast(attrs, [:prompt, :type, :choices])
    |> validate_required([:prompt, :type])
    |> validate_choices()
  end

  defp validate_choices(changeset) do
    case get_field(changeset, :type) do
      :multiple_choice ->
        validate_length(changeset, :choices, min: 2)

      _ ->
        changeset
    end
  end
end

Publishing then looks like:

draft
|> Forms.DraftForm.to_published_attrs()
|> then(&Forms.PublishedForm.changeset(%Forms.PublishedForm{id: draft.id}, &1))
|> Repo.update()

This is an interesting problem. It seems to me like the complexity is coming from the data model. The :status key is a column on the form table, but it’s actually representing the state of the questions table since the only thing that changes on a form is the associated questions, so I think adding the status column to the questions table and then just adding a validation that enforces the questions all have the same status as the form would simplify a lot of stuff cause then the validations could treat the question struct as a tagged union, and check constraints could also be added to the questions table to ensure data integrity for different statuses.

Ofc, that’s not an option for the OP since they mentioned they can’t change the data model. I would also assume the real use-case has a more complicated or messier schema, but it’s an interesting problem to think about at least. :slight_smile: