Why are @valid_attrs @update_attrs and @invalid_attrs using atom/declared-types keys while controllers will pass along stringified params?

woohaaha · October 6, 2020, 12:44am

When a controller takes incoming params they are stringified like:

%{"count" => "5", "active" => "true"}

however when we generate a context phoenix it will use the types to be specific on the @valid_attrs, @update_attrs, and @invalid_attrs it pre-populates in the context tests:

@valid_attrs %{count: 5, ative: true}

However, a controller code typically passes the incoming params right into the context. Here is an example from the Programming Phoenix book:

def create(conn, %{"user" => user_params}) do
 	  case Accounts.register_user(user_params) do
 	    {:ok, user} ->
 	      conn
 	      |> put_flash(:info, "#{user.name} created!")
 	      |> redirect(to: Routes.user_path(conn, :index))
 	
 	    {:error, %Ecto.Changeset{} = changeset} ->
 	      render(conn, "new.html", changeset: changeset)
 	  end
 	end

As we can see, the user_params are immediately passed along into the context. Had we used the generators the @valid_attrs would be typed and not strigified.

Can anyone tell me why it’s like this and how do you ensure that the context test code mimics real behavior?

Do you sanitize params? Or do you change the test code? Something else?

Thank you

thiagomajesk · October 6, 2020, 12:54am

If you want to understand why Phoenix uses string keys in the params map, you might want to take a look at these answers for further clarification...

This is the same reason why is considered a bad practice to dynamically create atoms in elixir. You’ll find this recommendation in many threads in the forum btw.

About the scaffolded tests though, If I remember correctly when you cast your attrs, Ecto doesn’t care if you have string keys or atom keys as long as they are not mixed together.
However, this could be a problem if you are mixin string and atom keys inside your functions by updating the params map directly before it gets cast. To avoid this, just use string keys on your tests.

woohaaha · October 6, 2020, 2:01am

Thank you for your reply. My question was, as you mentioned in the second part, related to the scaffolding of test data. As you say Ecto Changeset.cast doesn’t care about strings or atoms, that makes sense.

I still think however it is confusing because the dependency on cast to sort out the issue of string vs atom keys ends up presenting itself at the context layer. If someone happens to try to extract data from the incoming params manually they will have to check the params in the context or convert string to atom in the controller, or change the scaffolded data in the tests.

Perhaps there needs to be an agreed upon and safe pattern to convert incoming param keys to atoms?

LostKobrakai · October 6, 2020, 7:26am

Imo params should never be touched before they’ve gone through a changeset. It doesn’t need to be a schemas changeset and it doesn’t need persistance, but it should go through a changeset. Personally I even feel this can belong in the controller as opposed to in a context.

But actually doing validation at the incoming edge of the system AND doing it at the outgoing edge for data consistency in persistence is a lot of work and might seem unnecessary for as long as both validations essentially do the same, which they would for scaffold-ed code. As soon as you’re no longer passing params along “as is” to the changeset of your schema it’s no longer one and the same on both edges so it can be a solid solution to validate params directly in the controller already and let the schema’s changeset only deal with the data consistency as opposed to also deal with casting values.

Also I suggest treating the scaffolding of generators much more as a stepping stone then a holy grail. It’s meant go get people up to speed with phoenix quickly and show them high level concepts like not putting business logic into controllers, but it’s not an architecture you shouldn’t modify because it’s already perfect. There are tradeoffs being done just for the fact that code needs to be generated, there are tradeoffs between showing as few code as possible to not overwhelm people vs. showing a full picture. What even is a full picture to people of widely different degrees of experience generally and with phoenix specifically.

thiagomajesk · October 6, 2020, 2:58pm

Hi @LostKobrakai! I don’t mean to highjack the thread, but now you got me curious…
I have to be honest and say that is quite “natural” given the way that the project is scaffolded to just patch the params since they are the common denominator in both controllers, contexts, and schemas. So I wonder what are you doing in cases where you want to use part of the data from the params and compute the rest of the fields in the program before they get cast.

I have another thread dedicated to this subject, would you mind sharing your experience there?

LostKobrakai · October 6, 2020, 5:01pm

I would ask why computed fields shall be cast in the first place. I’d cast the user input, compute whatever is needed additionally and persist the results.

Personally (especially with liveview) I have started to more and more use changesets directly at the places where params are first handled in userland (directly in liveviews/controllers). It might not matter much to plain strings or numbers, but as soon as you get to e.g. dates or other complex data structures dealing with user input directly or even dealing with both user input and internal calls in contexts becomes cumbersome. Casting data in ecto is meant exactly to deal with the complexity of mapping e.g. string formatted values to complex elixir data structures, so why not use it as soon as possible.

I can totally see that this might seem overly complex looking at it from a plain old CRUD example, but it becomes very powerful as soon as schema and e.g. html form (or other input) do not align 1:1 anymore. Say you have a form for creating a relationship between two existing entities.

thiagomajesk · October 6, 2020, 10:36pm

I’m gonna give you a practical example that came to my mind while reading your answer, picture this:

You have to insert a new user in the database and you must fill a required “role” field which is an enum in the database (standard, moderator, and administrator).

Although the names of the roles can be customized by each organization, they always will be represented by one of the valid values of the underlying enum type, so: padawan (standard), knight (moderator), and master (administrator). Because of this requirement, you have to fetch the values from the database using another module.

Now you have the following business rule to apply:

Two users can’t have the same role, the first user to request the role gets the role, the next one will be demoted automatically if the role is taken. This happens until every role is distributed to all users.

I see at least three things that must be done in order for this operation to succeed:

Validate if the user input is valid (username, email, etc)
Validate if the “role” field is a valid enum type (standard, moderator, and administrator)
Validate if the “role” field does not break any constraints (unique role by organization)

Considering the default scaffolded code, I imagine that this means that in the contexts either you:

1. Fetch the valid roles and patch the params map, calling the changeset function with all the required values that will be validated against the schema:

def create_user(params, organization_id) do
  role = Roles.valid_organization_role(organization_id)

  %User{}
  |> User.changeset(Map.put(params, :role, role))
  |> Repo.insert!
end

2. Call the changeset function that will cast only the input values. After that, you'd have to fetch and validate the computed values with the changeset in hands. This segments the validation logic between the context and the schema:

def create_user(params, organization_id) do
  %User{}
  |> User.changeset(params)
  |> fetch_and_validate_organization_role(organization_id)
  |> Repo.insert!
end

def fetch_and_validate_organization_role(changeset, organization_id) do
  role = Roles.valid_organization_role(organization_id)
  # using Ecto to validate the changeset and putting the changes in the changeset after?
end

3. Pass the params directly to the schema, cast the input values, and validate after you have cast the values. This IMHO breaks what I'd consider the boundary of the schema since you would have to call another context from the schema itself to validate the values (I'm not completely sure about this opinion):

def create_user(params, organization_id) do
  %User{}
  |> User.changeset(params, organization_id)
  |> Repo.insert!
end

defmodule User do
  
  def changeset(user, attrs, organization_id) do
   user
    |> cast(attrs, [:username, :email, :role])
    |> validate_required([:username, :email, :role])
    |> fetch_and_validate_organization_role(organization_id) 
  end
  
  def fetch_and_validate_organization_role(changeset, organization_id) do
    role = Roles.valid_organization_role(organization_id)
    # using Ecto to validate the changeset and putting the changes in the changeset after?
  end
end

4. Just don't use the schema to generate changeset and do everything in the context:

def create_user(params, organization_id) do
  %User{}
  |> user_changeset(params, organization_id)
  |> Repo.insert!
end

def user_changeset(user, attrs, organization_id) do
  user
  |> cast(attrs, [:username, :email, :role])
  |> validate_required([:username, :email, :role])
  |> fetch_and_validate_organization_role(organization_id) 
end
  
def fetch_and_validate_organization_role(changeset, organization_id) do
  role = Roles.valid_organization_role(organization_id)
  # using Ecto to validate the changeset and putting the changes in the changeset after?
end

IMHO, I guess that this architectural “flexibility” is both the strong and weak suit of Phoenix as a framework. If I’m being honest I have questioned myself a lot about this before and I don’t think I’ve got a clear answer to that yet. In fact, if you take a look at the thread I linked in my previous post, you’ll see that there are some suggestions about not using changesets in the schema but directly in the contexts, which would allow you to cast fetch, compute and sanitize incoming data without breaking the “boundaries” of each “layer”.

tcoopman · October 7, 2020, 6:24am

I haven’t read the whole thread in detail, but I wanted to comment on this one:

This one you can only do if you have some kind of transactional boundary, either on the DB level or by ensuring that only one user at a time can be created per organisation.
An other option is to assign the role after the user has been created, so that you can ensure that it has transactional boundary.

To comment on the design itself. For me role and user can live in the same context and thus I see no issues in role being used in user

LostKobrakai · October 7, 2020, 8:02am

This is an interesting example, but actually not really what I’m talking about.

I split up the responsibility of casting user input – which includes mostly transforming string to complex types, bringing it into a known map/struct with just the (atom) keys expected (no additional ones), validating the values have the correct types – from what you’re talking about: storing data. Storing data has similar, but different, related concerns, which are also correct types and such, but much more broadly ensuring consistency.

In your case a changeset in the controller/liveview would just make sure of the basics: Ensure required fields are present, ensure the fields are the correct types. Something like:

def changeset(params \\ %{}) do
  {%{}, %{name: :string, email: :string, role: :string}} # :role could be an Ecto.Enum in ecto 3.5
  |> Ecto.Changeset.cast(params, [:name, :email, :role])
  |> Ecto.Changeset.validate_required([:name, :email, :role])
  |> Ecto.Changeset.validate_inclusion(:role, [:padawan, :knight, :master]) # maybe
end

This is all the “form” can validate. With the resulting map I’d then call into context functions to do the persistence part.

I’m not sure why you’d need the database here if the underlying enum is fixed. Basically all I see here is that the frontend would need to query the db for building a select with the correct translated names → enum values, but that’s presentation and unrelated to storing values.

Anyways: @tcoopman is correct in that you want a transaction around the fetching of valid roles per organisation. Changesets can do that with e.g. prepare_changes or you can wrap the insert in a multi/transaction on your own. All your options have a race condition.

Now to clarify my previous remarks:

What I meant with params is only the string based map of params you get in controllers/liveviews.

def create_user(params, organization_id) do
  role = Roles.valid_organization_role(organization_id)

  %User{}
  |> User.changeset(Map.put(params, :role, role))
  |> Repo.insert!
end

This first example of yours would fail if we’d talk about params coming from user input. Changesets break when you supply a mixed string and atom keyed map. Therefore at most I’d suggest using one of your other options, because you don’t really want to deal with the string keyed params manually. It’s much simpler to cast the user input first and deal with anything else you want to do later. And given my example from the start of this post is much more aligned with the idea of validating data at the edges.

And then the last part of your example: Coupling and how you deal with that. This is a much more involved topic and actually something more fitting for your other topic talking about DDD.

thiagomajesk · October 7, 2020, 2:02pm

The “roles” context was merely an example, don’t get caught on the semantics of my example. If you make the claim that it’s supposed to be in the same context, the problem naturally will disappear. The constraint I’m putting in my example is that you must fetch data from another context, hence the boundary constraint. It might as well be another computed field that has nothing to do with the user directly.

In the example I gave, the values are computed by the business logic, not inputted by the user (I’ve updated the answer before I submitted, maybe you’ve read a different version that does not explain why this won’t happen).

Not actually (I explained why in my answer above). The way I understand, even though prepare_changes is a nice solution to wrap the changes in one single transaction, if we are talking about boundaries it still falls under one of the options I listed. Unless of course, if you are dealing with changesets in the context.

The first example has a typo, it should be: User.changeset(Map.put(params, "role", role)). I’ve explained that ecto does not support that here. Just to be crystal clear, can you exemplify how would you tackle this problem if not using one of the options I presented?

This is certainly a concern that I had and impacted how the team I’m working on chose to deal with data sanitization/ computing in our applications (by patching the params). Every single person touching the codebase had almost the same questions (we are all elixir newbies). Therefore, If having boundaries were not a problem, this issue would not appear in the first place. Even though the main question is about the test code, this actually reflects a deeper problem regarding how to organize the code and how to actually act on the values passed to contexts.

LostKobrakai · October 7, 2020, 2:46pm

For me any of your options would work as the problem is one of data consistancy and persistence. It’s not one of casting and dealing with user input params being string keyed and having stringified values, which was my point in my initial answer of this thread. To boil it really down: I don’t care where one would compute the role, for as long as it wouldn’t touch the params as they’re coming into the system, but those params should have been put through a Ecto.Changeset.cast before anything is derived from them. This cast can be in the schema, but as soon as it’s not just a “persist exactly those params with a single schema” I tend to like casting params directly in the controller/liveview where they enter the system and let contexts only deal with the results of that, a.k.a. maps with a known list of atoms keys and known types of values if not even a struct of that.

In ecto there’s the concept of three types of data representations: There’s data coming into the system (params), which need validation, parsing, type conversion, …. There’s runtime data, which is the data you have in the schema and your system works with. And then there’s the representation you send of to the db driver to persist. Ecto.Types are the element to convert between all of them: cast takes incoming data to runtime data, dump and load do db format ↔ runtime data. My point is just about moving the task of casting incoming data to runtime data nearer to the place it enters the system, while your problem imo fully lies in the realm of dealing with data in it’s runtime format.

tcoopman · October 7, 2020, 9:04pm

Maybe I’m missing something but if it’s because you computed it in the business logic, you still need the transactional boundary unless the business logic has that guarantee of only computing a value one by one.
The whole reason for the transactional boundary is concurrency and that can happen in the computation as well.

woohaaha · October 9, 2020, 1:38am

Really great discussion. I’ll take a closer look at what Ecto gives us regarding sanitizing/formatting incoming parmas into “attrs”.

I’m guessing that’s the theme that was mentioned above. Incoming user data called params, but in context they assume a known structure and therefore perhaps a name change to attrs? maybe?

Also, I won’t be so fixated on what the generators generate