Ecto.CastError Mixed Keys Issue

In a typical business development task, having a function in a context module which accepts attributes of map type and passes them to Ecto for a validated Changeset is a common case.

  def create_foo(%{} = attrs) do
    attrs
    |> Foo.create_changeset()
    |> Repo.insert()
  end

Those functions may be called from everywhere; Normally, it is quite common to express the keys by atom while calls from Phoenix Controllers and LiveViews have keys in string. So these business functions should support both types of keys and thanks to the cast method, they do by default.

But, what if by some requirement it is needed to change the passed in attributes before passing them to Ecto for changeset or validations? By which data type the attribute keys should be addressed? string or atom? Suppose the following function:

  def create_bar_foo(%{} = attrs) do
    attrs
    |> Map.put(bar_case: :default_or_computed_value)
    |> Foo.create_changeset()
    |> Repo.insert()
  end

If it called from a test with normal atom keys, every thing would be fine while if it called from a LiveView with string keys, there will be a raise:

(Ecto.CastError) expected params to be a map with atoms or string keys, got a map with mixed keys ...

So the keys should be normalized before processing and this article suggest some solution. Also this post addresses it somehow.

There can be lots of other solutions to this problem, for example one can simply cast passed attributes before manipulation them and so on. But I think this can be included in Ecto.Changeset.cast method because:

  • Ecto.Changeset.cast is already supporting both atom and binary keys and also touching the case by doing conversion from string keys to atom ones. Also it sounds feasible to support mix of both types.
  • There is no harm to existing codes as actually a new capability is being added.
  • It encourages having some business implemented in context before using Ecto directly or having lots of ###_changeset functions per specific business case.

I respectfully disagree with this whole proposal.

You semi-handwaved this solution but for me this is the right answer and one of those things I’m pretty religious about in my own code. We should be converting—ie casting—our data into a known shape before doing anything with it, and this is exactly what Changeset.cast provides us. This is especially important in a dynamic language.

For me, string keys mean “untrusted”. Using this definition, general application code should rarely ever have a need to set a string key (of course there are always exceptions).

Multiple changesets are generally encouraged by the framework. “Citation needed,” yes sorry I don’t have doc or discussion links atm, the best I can give right now is to look at the User schema generated by phx_gen_auth. I actually do prefer to keep a single changeset myself when possible (it’s not a hard rule) but it results in me writing functions like maybe_assign_slug/1 which are probably more complex than they need to be if I’d just use multiple changesets.

If you really want to do these things in the context, there is no harm in manipulating changesets in the context—changesets are kind of wild as they are having a strong presence in the web layer as well as the business layer.

Of course, you could also do stuff like this in the context:

def create_article(attrs) do
  %Article{}
  |> Article.changeset(attrs)
  |> MyApp.ChangesetHelpers.assign_slug_from(:title) # module name for illustrative purposes :D
  |> Repo.insert()
end

Finally, I think any promotion of the idea that mixing map key types is ok is a bad idea, again, especially in a dynamic language.

2 Likes

Because you trust yourself adding this key pair you might want to add it after creating the changeset as a change like this:

          change(
            changeset,
            Map.new() |> Map.put_new(:key, value)
          )
         |> Repo.insert()

No need to figure out what type of map we need atom or string.

Best regards

As @joaquinalcerro mentioned, my suggestion is to pass any additional default value explicitly instead of trying to manipulate params.

EDIT: the code below assumes that the bar_case is computed based on already valid and existing data. If you need to compute the data based on external potentially invalid parameters, then keep on reading the thread. :slight_smile:

You could do this:

  def create_foo(%{} = attrs) do
    attrs
    |> Foo.create_changeset(bar_case: :default_or_computed_value)
    |> Repo.insert()
  end

And then create_changeset does:

  def create_changeset(params, defaults \\ []) do
    %Foo{}
    |> cast(params)
    |> change(defaults)
  end

Alternatively you can do this:

  def create_foo(%{} = params) do
    %Foo{bar_case: :default_or_computed_value}
    |> Foo.create_changeset(params)
    |> Repo.insert()
  end

Etc. Feel free to change create_changeset as necessary to conform to your code.

1 Like

To summarize and add more clarification, the main concerns are as follows:

  • Changing the attributes in a middle business (e.g. context) layer, before passing them for changes and validation to Ecto.Changeset.
  • Reusing existing ###_changeset functions instead of being forced to create new ones for each of later causes.
  • And absolutely to not bypass any validations before committing to any final result (e.g. persisting).

And this code is quite fitting.

The other ones which changes attributes bypassing validations are violating consistency and even this one is doing so implicitly:

It all depends on what is the source of the data.

If the source of the data is your own application, validations are pointless, because it makes no sense to tell the user that “bar_case is invalid” when they have no power over setting :bar_case. My reply was written from that perspective (I will clarify this in my previous message to avoid future confusion).

However, if you are manipulating other external params to compute values (and now reading back on your original thread, you said that this is indeed the case), then I agree with your concerns. In such cases, I would try to cast the initial params, then compute additional changes, and then validate the additional changes:

data
|> cast(params, ~w(foo baz))
|> validate_foo_and_baz()
|> compute_bar_case_as_change()
|> validate_bar()

But even this requires care. If you add a validation error to bar, and it is a computed value from foo and baz, you need to make sure the error points to the correct place.

2 Likes

I think a more specific example use case might me useful here because I do think it is quite rare, but otherwise I strongly agree with other commenters that modifying params is an antipattern. In fact I would go as far as to say even if it was supported I wouldn’t make use of it. We as developers are naturally lazy and that is generally speaking a good thing as it motivates us to avoid producing spaghetti code. But here the extra code actually simplifies the design because it represents the innate complexity of the data insofar as it implicitly must deal with multiple input sources (as José says if the param in question is system data there is no reason to cast it). When troubleshooting/debugging or even just grokking part of a program it is very useful to be able to clearly and easily trace the path of a piece of data, and if necessary, modify it in isolation. Like many things it might seem a simple matter to recognize the “mixing” these things tend to multiply overtime and in aggregate produce more fragile, less maintainable code.

A computed/calculated value wouldn’t necessarily call for an extra changeset, but as shown in examples above it has its own logic path which starts from the changeset, not the changeset params.

3 Likes

Abstractly expressing; and as good practice, this code (as you suggested before):

  def changeset(data, any_params, custom_internal_params \\ []) do
    data
    |> cast(any_params, ~w(foo bar baz))
    |> change(custom_internal_attrs)
    |> all_validations() 
  end

satisfies the requirements.

I wonder how there exists such a trust to internal parameters, because:

And everyone could wake up the next day and change the trusty value to something out of a curated validation path and create a bug easily.

This is why I suggested more specific examples, there is no universal law here, but generally it is a matter of avoiding being overly defensive. Yes it is advisable to place safeguards at boundaries (e.g. a db constraint) but adding a changeset validation to a data set in the same function is a poor safeguard because it is as easy for a dev to remove a validation as it is to change the data path, and in such a case an application error would actually be significantly more useful than a changeset error.

Just to reiterate, the problem with overly defensive code (which otherwise sounds like an oxymoron because our user’s data is Very Important etc) is that there are an infinite number of potential bugs that could arise from developer error. Code can only reasonably guard against user error. In the case of a database constraint, we can think of our application as a “user” of our database, which has its own internal logic with different goals (persistence and enforcement of relationships vs data transformation).

2 Likes

All that said is true and sound. In a typical application, the data and their integrity are crucial. So we provides constraints and rely on them in a layered fashion and in the following order:

  • Application Layer (e.g. Controller, LiveView, etc.)
  • Business Layer (e.g. Context, Genserver, etc.)
  • Database Adapter Layer (changeset validations)
  • Database Layer (db constraints)

So that each layer above is not required to repeat any constraint already expressed below and if it does, it is just for handling better responsiveness or error prevention and not for preventing disaster and missing consistency.

All this topic is about the two middle layers and their inter-relation. Of Course the safeguard is playing their roles and being overly defensive or obsessed is not considered.

All that matters is providing discipline in code flow (especially in a team) and promoting reusability and unification over curated validations. And also letting the Business Layer to play its role; not as a rule but as a flexibility and keeping concerns.

Yup! It sounds like my view came from considering more than needed dependency for each layer (probably from other ecosystems), to such an extent that there wasn’t any differentiation between internal and external data. And now it is getting clear to me how it sounds overly defensive.

Thank you all

But I still believe that the proposal itself and it’s workaround solution (the following) is true.

p.s. by all_validations I don’t mean necessarily “all validations in one place”, but “all necessary validations”.