Where to put validations that require a database query?

Today I encountered a problem. I have some schema like this:

defmodule MyApp.Relationship do
  schema "relationships" do
    belongs_to :context, Context
    field :from, :string
    field :to, :string

I need to validate that the relationships in one context form a single tree. This is not something I can rely on database constraints. I have to load all relationships in that context then do the validation.

For example, if my database already contains such records

context_id from to
1 “a” “b”
1 “a” “c”
1 “b” “d”

that is

   / \
  b   c

then I should be able to insert %{context_id: 1, from: "d", to: "e"},
but inserting %{context_id: 1, from: "c", to: "d"} should fail because it makes the relationships not a tree.

The validation logic itself is simple thanks to libgraph, so you don’t need to worry about that.

My questions are:

  • Where should I put such validation code? A separate module for validation or just a private function that validates the changeset?
  • Which field should I attach the error to?

That’s not exactly true. You cannot validate the relationship due to the amount of data you have on the table.

You could explicitly store a foreign key to a tree table on each “piece” a tree needs and validate it like described here for tenancy: Multi tenancy with foreign keys — Ecto v3.7.1

That’s not what I mean. Okay, let’s forget about the “multi-tenant” part, and suppose we should only have one small tree in the whole database. How can I validate a new insertion won’t violate the tree structure?

Yeah, seems I misread the question.

You don’t at the db level – unless the db understands trees. I’d prevent direct insertions/updates of parts of a tree. Let any edit go through code, which is aware of the whole tree and only if that tree is still valid persist the changes.

That’s what I’m going to do. The problem is, should I put the validation code inside the schema module, like

defmodule MyApp.Relationship do
  schema "relationships" do

  def changeset(relationship, params) do
    |> cast(params, [:from, :to])
    |> validate_tree()

  defp validate_tree(changeset) do
    # Read DB then validate the changeset here

Or should I put those code in a separate module?

The other problem is, when validation failed, I want to add_error(changeset, field, error_message), but which field should I pass to add_error/3?

IMO in the Schema module because it’s the guard dog against invalid state in the DB – same way as people put various changeset modification functions and validators in the Schema modules.

Doesn’t really matter and it’s only a question of ideal semantics at one point. Just put it under from and have your code pattern-match on errors in it and you’re done.

1 Like

I ended up attaching the error message to the pseudo field :_ :sweat_smile:

TBH I wouldn’t recommend either place - a validation like this means that “create a relationship in this context” is a more complex operation than a simple Repo.insert.

My suggestion:

  • don’t do the validation in the changeset function. Keep the schema focused on the individual-record-level of detail.
  • write a function that bundles up all the things you’ll want to do:
    • (probably) lock the corresponding context row to prevent concurrent modification
    • verify the relationship data is correctly-shaped (this uses the changeset)
    • verify the expected tree relationship
    • finally insert the data