In a typical business development task, having a function in a context module which accepts attributes of map type and passes them to Ecto for a validated Changeset is a common case.
def create_foo(%{} = attrs) do
attrs
|> Foo.create_changeset()
|> Repo.insert()
end
Those functions may be called from everywhere; Normally, it is quite common to express the keys by atom while calls from Phoenix Controllers and LiveViews have keys in string. So these business functions should support both types of keys and thanks to the cast method, they do by default.
But, what if by some requirement it is needed to change the passed in attributes before passing them to Ecto for changeset or validations? By which data type the attribute keys should be addressed? string or atom? Suppose the following function:
def create_bar_foo(%{} = attrs) do
attrs
|> Map.put(bar_case: :default_or_computed_value)
|> Foo.create_changeset()
|> Repo.insert()
end
If it called from a test with normal atom keys, every thing would be fine while if it called from a LiveView with string keys, there will be a raise:
(Ecto.CastError) expected params to be a map with atoms or string keys, got a map with mixed keys ...
So the keys should be normalized before processing and this article suggest some solution. Also this post addresses it somehow.
There can be lots of other solutions to this problem, for example one can simply cast passed attributes before manipulation them and so on. But I think this can be included in Ecto.Changeset.cast method because:
Ecto.Changeset.cast is already supporting both atom and binary keys and also touching the case by doing conversion from string keys to atom ones. Also it sounds feasible to support mix of both types.
There is no harm to existing codes as actually a new capability is being added.
It encourages having some business implemented in context before using Ecto directly or having lots of ###_changeset functions per specific business case.
You semi-handwaved this solution but for me this is the right answer and one of those things I’m pretty religious about in my own code. We should be converting—ie casting—our data into a known shape before doing anything with it, and this is exactly what Changeset.cast provides us. This is especially important in a dynamic language.
For me, string keys mean “untrusted”. Using this definition, general application code should rarely ever have a need to set a string key (of course there are always exceptions).
Multiple changesets are generally encouraged by the framework. “Citation needed,” yes sorry I don’t have doc or discussion links atm, the best I can give right now is to look at the User schema generated by phx_gen_auth. I actually do prefer to keep a single changeset myself when possible (it’s not a hard rule) but it results in me writing functions like maybe_assign_slug/1 which are probably more complex than they need to be if I’d just use multiple changesets.
If you really want to do these things in the context, there is no harm in manipulating changesets in the context—changesets are kind of wild as they are having a strong presence in the web layer as well as the business layer.
Of course, you could also do stuff like this in the context:
def create_article(attrs) do
%Article{}
|> Article.changeset(attrs)
|> MyApp.ChangesetHelpers.assign_slug_from(:title) # module name for illustrative purposes :D
|> Repo.insert()
end
Finally, I think any promotion of the idea that mixing map key types is ok is a bad idea, again, especially in a dynamic language.
As @joaquinalcerro mentioned, my suggestion is to pass any additional default value explicitly instead of trying to manipulate params.
EDIT: the code below assumes that the bar_case is computed based on already valid and existing data. If you need to compute the data based on external potentially invalid parameters, then keep on reading the thread.
You could do this:
def create_foo(%{} = attrs) do
attrs
|> Foo.create_changeset(bar_case: :default_or_computed_value)
|> Repo.insert()
end
And then create_changeset does:
def create_changeset(params, defaults \\ []) do
%Foo{}
|> cast(params)
|> change(defaults)
end
Alternatively you can do this:
def create_foo(%{} = params) do
%Foo{bar_case: :default_or_computed_value}
|> Foo.create_changeset(params)
|> Repo.insert()
end
Etc. Feel free to change create_changeset as necessary to conform to your code.
If the source of the data is your own application, validations are pointless, because it makes no sense to tell the user that “bar_case is invalid” when they have no power over setting :bar_case. My reply was written from that perspective (I will clarify this in my previous message to avoid future confusion).
However, if you are manipulating other external params to compute values (and now reading back on your original thread, you said that this is indeed the case), then I agree with your concerns. In such cases, I would try to cast the initial params, then compute additional changes, and then validate the additional changes:
But even this requires care. If you add a validation error to bar, and it is a computed value from foo and baz, you need to make sure the error points to the correct place.
A computed/calculated value wouldn’t necessarily call for an extra changeset, but as shown in examples above it has its own logic path which starts from the changeset, not the changeset params.
Abstractly expressing; and as good practice, this code (as you suggested before):
def changeset(data, any_params, custom_internal_params \\ []) do
data
|> cast(any_params, ~w(foo bar baz))
|> change(custom_internal_attrs)
|> all_validations()
end
satisfies the requirements.
I wonder how there exists such a trust to internal parameters, because:
And everyone could wake up the next day and change the trusty value to something out of a curated validation path and create a bug easily.
This is why I suggested more specific examples, there is no universal law here, but generally it is a matter of avoiding being overly defensive. Yes it is advisable to place safeguards at boundaries (e.g. a db constraint) but adding a changeset validation to a data set in the same function is a poor safeguard because it is as easy for a dev to remove a validation as it is to change the data path, and in such a case an application error would actually be significantly more useful than a changeset error.
Just to reiterate, the problem with overly defensive code (which otherwise sounds like an oxymoron because our user’s data is Very Important etc) is that there are an infinite number of potential bugs that could arise from developer error. Code can only reasonably guard against user error. In the case of a database constraint, we can think of our application as a “user” of our database, which has its own internal logic with different goals (persistence and enforcement of relationships vs data transformation).
All that said is true and sound. In a typical application, the data and their integrity are crucial. So we provides constraints and rely on them in a layered fashion and in the following order:
So that each layer above is not required to repeat any constraint already expressed below and if it does, it is just for handling better responsiveness or error prevention and not for preventing disaster and missing consistency.
All this topic is about the two middle layers and their inter-relation. Of Course the safeguard is playing their roles and being overly defensive or obsessed is not considered.
All that matters is providing discipline in code flow (especially in a team) and promoting reusability and unification over curated validations. And also letting the Business Layer to play its role; not as a rule but as a flexibility and keeping concerns.
Yup! It sounds like my view came from considering more than needed dependency for each layer (probably from other ecosystems), to such an extent that there wasn’t any differentiation between internal and external data. And now it is getting clear to me how it sounds overly defensive.
Thank you all
But I still believe that the proposal itself and it’s workaround solution (the following) is true.
p.s. by all_validations I don’t mean necessarily “all validations in one place”, but “all necessary validations”.