What is your preferred way of dealing with computed fields in Ecto?

thiagomajesk · May 10, 2022, 8:52pm

Hi, I was wondering something for the past few weeks while working on multiple projects and I’d like to collect some feedback. Here’s the use case…

Very often, we’ll have the need to store computed fields after some sort of validation has happened. Those fields could be something simple as a user’s full name or something more complex like markdown content that needs to be parsed.

While working on different projects, I always came across the same uncertainty: Where would be the best place to put the computing logic. After thinking carefully about it, I had a different answer for each project I was working on. Here are two similar approaches with conceptual differences:

First example

The context module abstracts all inputs behind the attrs map and everything else is delegated to the schema:

# attrs = %{"name" => "john", "surname" => "doe"}
def create_user(attrs) do
  %User{}
  |> User.changeset(attrs)
  |> Repo.insert
end

The schema module takes care of holding the relevant logic and computing the field when the changeset is valid. The changeset validation is coupled to the computing logic (this will be important later):

def changeset(user, attrs) do
   user
   |> cast([:name, :surname])
   |> validate_required([:name, :surname])
   |> put_full_name()
end

def put_full_name(changeset) do
  if changeset.is_valid? do
    name = get_change(changeset, :name) 
    surname = get_change(changeset, :surname)
    put_change(changeset, :surname, name <> surname)
  else
    changeset
  end
end

Second example

In the context, user and “application” input are split. We take care of including the computed value after we have validated user input. The changeset validation is decoupled with the computing logic (this will be important later):

# content = "User x has just logged in!"
# attrs = %{"user_id" => 1 "user_full_name" => "john doe"}
def system_message(attrs, content) do
  %Message{}
  |> Message.changeset()
  |> Message.put_html(html)
  |> Repo.insert
end

The schema module only takes care of validating user input and it optionally exposes a way to deal with additional computed values:

def changeset(user, attrs) do
   user
   |> cast([:user_id, :user_full_name])
   |> validate_required([:user_id, :user_full_name])
end

# Depending on how complex or specific this is; 
# it could be left inside of the schema or kept in the context
def put_html(changeset, html) do
    html = Markdown.parse(content) 
    put_change(changeset, :html, html)
end

The obvious distinction is that the second example heavily depends on the context function to provide a “valid” way to insert a message while the first does not.

Both approaches have obvious advantages and disadvantages. For instance, if you have to interface with a lot of other modules to have a given value computed (perhaps an external API or service), it would probably be more reasonable to place the logic in the context to not “overload” the schema with too many responsibilities.

Considering that the resulting API is a direct correlation with the level of abstraction that we chose and given that contexts already represent some sort of public API for the application, what approach do you usually prefer in your projects?

cmo · May 10, 2022, 9:42pm

I tend to keep schema modules paper thin and purely functional. Generally only validation and query logic live there.

Storing the full name seems pointless in this example and can your markdown parse function not fail either? Sems like there should be a case or with in the second one, which would exclude it from the schema module in my mind.

thiagomajesk · May 10, 2022, 11:00pm

Not at all, the example is so trivial that it illustrates perfectly what I intended - which is about the different levels of abstraction. For the sake of the illustration, pretend we are using the simplest markdown parser possible and it won’t fail as these types of side effects are not exactly what the post is about, but rather the implications on how to express the API around it.

Could you elaborate on that? What is “paper thin” and “purely functional” in the context you are talking about. Is it exclusively just changesets and queries? How would you approach the markdown example, for instance, and why do you prefer it over the other?

cmo · May 11, 2022, 2:18am

I think it can be hard to talk about these things in the micro and there is no perfect way. I lean toward the second option as I subscribe to the cult that the schema shouldn’t be calling out to other modules unless it is a helper for some common validation logic or Ecto.Query.

I like to write the args out explicitly if there aren’t too many and usually handle the yucky user input side of things further out in the system. But if it’s a simple one-one relationship between user input and database representation then using the schema is less work.

defmodule Messages.Message do
  def changeset(message \\ %Message{}, attrs) do
     message
     |> cast([:user_id, :user_full_name, :html])
     |> validate_required([:user_id, :user_full_name])
  end
end

defmodule Messages do
  # obvious what is required to create this record
  def system_message(id, full_name, content) do
    html = Markdown.parse(content)
    
    %{user_id: id, user_full_name: full_name, html: html}
    |> Message.changeset()
    |> Repo.insert()
  end
end

My schema modules are often just a schema and nothing else, especially when the context(s) that they’re a part of encompass multiple schemas. I don’t have anything that is not a changeset function, validations or composable queries in any of them. In the “worker bee” OTP book they use a separate query module but I feel you need to be careful not to introduce an unnecessary layer of abstraction and/or rewrite Ecto.Query in there.

Have you read Sasa Juric’s thoughts on the topic? Interested to hear other thoughts.

thiagomajesk · May 11, 2022, 2:12pm

@cmo It’s also important to notice that there’s another layer of abstraction here that goes further than code organization. As you can see, in the second example I provided, the logic that actually generates the HTML output is not required to be in the schema. The implications of this are twofold: a) having a strictly valid API at the schema level; in contrast to b) having a strictly valid API at the context level.

Another thing I should mention that is related to your example: In this specific case, it works very well with just a few arguments that you can express as the public API. However, when that’s not the case, it might be tempting to tamper with the user input (like modifying the attrs map). If you do that, you should be extremely cautious to be consistent in using either strings or atoms as keys (since ecto can deal with both), but even though, you’d still limit the shape of the map by hardcoding the keys in a specific format.