Where to put validations that require a database query?

Aetherus · December 2, 2021, 8:09am

Today I encountered a problem. I have some schema like this:

defmodule MyApp.Relationship do
  schema "relationships" do
    belongs_to :context, Context
    field :from, :string
    field :to, :string
  end
end

I need to validate that the relationships in one context form a single tree. This is not something I can rely on database constraints. I have to load all relationships in that context then do the validation.

For example, if my database already contains such records

context_id	from	to
1	“a”	“b”
1	“a”	“c”
1	“b”	“d”

that is

    a
   / \
  b   c
  |
  d

then I should be able to insert %{context_id: 1, from: "d", to: "e"},
but inserting %{context_id: 1, from: "c", to: "d"} should fail because it makes the relationships not a tree.

The validation logic itself is simple thanks to libgraph, so you don’t need to worry about that.

My questions are:

Where should I put such validation code? A separate module for validation or just a private function that validates the changeset?
Which field should I attach the error to?

LostKobrakai · December 2, 2021, 9:23am

That’s not exactly true. You cannot validate the relationship due to the amount of data you have on the table.

You could explicitly store a foreign key to a tree table on each “piece” a tree needs and validate it like described here for tenancy: Multi tenancy with foreign keys — Ecto v3.7.1

Aetherus · December 2, 2021, 9:33am

That’s not what I mean. Okay, let’s forget about the “multi-tenant” part, and suppose we should only have one small tree in the whole database. How can I validate a new insertion won’t violate the tree structure?

LostKobrakai · December 2, 2021, 9:39am

Yeah, seems I misread the question.

You don’t at the db level – unless the db understands trees. I’d prevent direct insertions/updates of parts of a tree. Let any edit go through code, which is aware of the whole tree and only if that tree is still valid persist the changes.

Aetherus · December 2, 2021, 9:43am

That’s what I’m going to do. The problem is, should I put the validation code inside the schema module, like

defmodule MyApp.Relationship do
  schema "relationships" do
    ...
  end

  def changeset(relationship, params) do
    relationship
    |> cast(params, [:from, :to])
    |> validate_tree()
  end

  defp validate_tree(changeset) do
    # Read DB then validate the changeset here
  end
end

Or should I put those code in a separate module?

The other problem is, when validation failed, I want to add_error(changeset, field, error_message), but which field should I pass to add_error/3?

dimitarvp · December 2, 2021, 10:10am

IMO in the Schema module because it’s the guard dog against invalid state in the DB – same way as people put various changeset modification functions and validators in the Schema modules.

Doesn’t really matter and it’s only a question of ideal semantics at one point. Just put it under from and have your code pattern-match on errors in it and you’re done.

Aetherus · December 2, 2021, 10:13am

I ended up attaching the error message to the pseudo field :_

al2o3cr · December 2, 2021, 3:00pm

TBH I wouldn’t recommend either place - a validation like this means that “create a relationship in this context” is a more complex operation than a simple Repo.insert.

My suggestion:

don’t do the validation in the changeset function. Keep the schema focused on the individual-record-level of detail.
write a function that bundles up all the things you’ll want to do:
- (probably) lock the corresponding context row to prevent concurrent modification
- verify the relationship data is correctly-shaped (this uses the changeset)
- verify the expected tree relationship
- finally insert the data