Changeset purity exceptions?

AndrewKlymchuk · July 20, 2021, 2:15pm

Hello.
It’s considered best practice to keep all ecto changesets pure (deterministic, without side effects).
Sometimes data validations required call to external resource. For example to validate that email is not only match regex, but actually exists, we can make http request to email provider service.
Moreover Ecto.Changesets contains such functions as unsafe_validate_unique/4 which performs call to database.
So, is it ok to use functions performing external calls in changesets or should we try to move all such calls outside of them?

Jcambass · July 20, 2021, 2:49pm

Personally I think that having external calls like http requests inside the changeset logic hides the complexity and also the resilliency consequences of doing that call to much.

I would believe that having that logic outside of the changeset logic makes the code easier to maintain and change (maybe you’ll want to fallback to another validation strategy if the network service is unavailable in the future).

That being said I never implemented something along those lines specifically for ecto changesets since I never had the need for it so far.

kip · July 21, 2021, 5:40am

I think there is a benefit to a changeset being indempotent. And certain validations break that (like checking the database for duplicates) since between changeset validation and database update there is no guarantee that there won’t be a duplicate.

stefanchrobot · July 21, 2021, 6:32am

A fairly common approach is to keep the schema definition file side-effect free. Most changesets and queries fit that style. In that setup, any code that is not pure (hitting the DB or an external service) is pushed to the context layer. I personally like that separation and would try to keep that even if it meant moving parts of the validation to the context.

dimitarvp · July 21, 2021, 7:44am

It can go both ways but one thing that’s IMO mandatory is – when you make a choice, be consistent. Either allow side effects in all changeset functions, or in none.

I personally prefer to keep the changesets pure because IMO higher-level validations and/or changes to the changeset by using 3rd party services should be reserved for contexts since they are supposed to take care of high-level concerns. But I’ve seen the opposite as well and it’s honestly not as bad as people claim. You just have to be consistent about it and still keep your code somewhat readable, and possibly put good instructive comments in the code.

fuelen · July 21, 2021, 3:19pm

Ecto.Changeset provides a nice tooling for both pure and impure validations. When you receive a response from a third-party service and want to represent this as an error for a specific argument, what would you do? I’d try to support the same error format which I use when convert invalid changeset to JSON. So, when email is invalid, I’d have to create an error like

{error: {email: ["not found"]}}

If you don’t use Ecto.Changeset you’ll end up with a bunch of custom functions and modules which reimplement a tiny part of Ecto.
I’d say it is OK to use side effects in changesets. Question about purity/impurity is more general. It is a good practice to separate side effects no matter what tooling do you use. Changesets are very composable. It is really easy to create a changeset function with regular validations and a changeset function with unsafe validations. Here is an example:

%User{}
|> User.create_changeset(params)
|> apply_unsafe_changeset(&MyEmailModule.unsafe_validate_email(&1, :email))

def apply_unsafe_changeset(changeset, function) do
  if Application.fetch_env!(:myapp, :unsafe_enabled) do
    function.(changeset)
  else
    changeset
  end
end

With this approach, you can disable side effects in your tests completely and test such validations explicitly and independently.

al2o3cr · July 21, 2021, 5:35pm

For this specific example, I’d definitely stay away from doing this in a changeset. The remote service could be down, or the user could have an email address that can’t be detected via SMTP.

ecx · July 21, 2021, 10:05pm

Our application presumes changesets can be generated then batched, composed via Ecto.Multi, etc.—if the side effect fires at the time the changeset is generated, rather than at the time it’s run, it opens up a whole can of worms.

kreiling.io · July 21, 2021, 11:30pm

I do mix pure+impure changeset validations but I think it’s important to isolate the pure from impure. My rule of thumb is to keep any changeset functions pure within a schema module, and put impure changeset functions in their own separate modules.

These functions would probably be composed within the Context boundary encapsulating the User entity.

defmodule User do
  use Ecto.Schema

  embedded_schema do
    field :email, :string
  end

  def changeset(params) do
    %__MODULE__{}
    |> Ecto.Changeset.cast(params, [:email])
    |> Ecto.Changeset.validate_length(:email, min: 1)
  end
end

defmodule UserEmailVerifier do
  def verify_email(%Ecto.Changeset{valid?: false} = changeset, _field), do: changeset

  def verify_email(%Ecto.Changeset{valid?: true} = changeset, field) do
    Ecto.Changeset.validate_change(changeset, field, fn _, email -> 
      if ExternalService.valid_email?(email) do
        []
      else
        [{field, "is not valid according to external service"]
      end
    end)
  end
end

%Ecto.Changeset{valid?: false} =
  %{email: "invalid@example.io"}
  |> User.changeset()
  |> ExternalService.verify_email(:email)