nathanl
A case for *validating* uniqueness
Ecto makes a point of checking uniqueness via database constraints, not via validations. The reasoning is that if we rely on a validation, we’re open to race conditions - two simultaneous requests claim the same username, for example, because each process asks the database “is this username taken?”, each gets the answer “no”, and each then inserts it.
Ecto.Changeset has built-in validations like validate_required and validate_format which can be checked without talking to the database. If all of those pass, we attempt the insert/update. If that triggers a constraint error and we’ve told the changeset to look for it (using unique_constraint(:username) or whatever), the error gets parsed into a friendly user message.
I get all this, and I’m glad that Ecto has great support for constraints, because it’s truly impossible for application code to guarantee uniqueness without using database constraints, table locks, serializable transactions, or some other database mechanism I don’t know about.
However, I think we should also check for uniqueness (and other ‘conflicting data in the database’ situations) in validations. My reasoning is this:
- If someone has taken the username I want, the odds are overwhelmingly large that a validation that queries for conflicting records would catch that. The other user almost certainly took that username 1 year ago, 1 day ago, or even 1 second ago. The only time the validation won’t catch it is if they took it (say) 1 millisecond ago, after my data was validated but before it was inserted. In other words, assuming they took it at some random time between when the application was created and now, the odds that that time was less than a millisecond ago are extremely small.
- If I submit form data that would violate 3 validations and 2 constraints, I may have to submit it 3 times in a row to fix everything: the first time to fix the validations (when the constraints haven’t been checked yet), the second time to fix the first constraint error, and the third to fix the last constraint error. This is because while validations can tell us every problem they detect at once, databases (or at least PostgreSQL) only show the first constraint violation they detect; you have to fix that one and try the
INSERTagain to see the next constraint error. Making users submit a form where they’ve fixed all the error messages, only to present them with another round of error messages, is a bad user experience.
Therefore, if you need to ensure something like “username must be unique”, I argue that you should 1) check it in the validation phase by running a query and present the error along with things like “name can’t be blank”, and 2) also have a database constraint in case of race conditions, and use unique_constraint(:username) in your changeset function to ensure that error is handled gracefully if it happens. (If you’re both validating uniqueness and checking a constraint, the constraint error could say “whoops, looks like someone just took that username”, since that’s the only situation where they’d see it.).
The downsides of doing this are that 1) your validation phase now requires the database, which makes testing a little worse 2) it’s not very DRY 3) it’s extra work. But 1) can be dealt with by breaking having a run_database_dependent_validations function and testing it separately. 2) and 3) are developer pain that I think is justified by the better user experience.
Please add your rebuttals, questions, congratulations, murmurs of assent, etc.
Most Liked Responses
michalmuskala
Another thing to consider is what happens with API-driven applications. In that case, I would say that the uniqueness validation being done on the server is actually not a good idea. Front-end apps normally do all sorts of validations on the form inputs, and uniqueness shouldn’t be different. How would that work? Front-end issues an additional request to check for uniqueness and show the error in the UI as soon as the field is filled. This also allows to show the error right away and not after the server round-trip of full form submission.
In that case, the changeset uniqueness validation becomes truly a fallback mechanism, and it’s completely fine that it runs only after all the other validations, since it’s going to be triggered rarely.
anders
The general version of the problem you describe has been discussed at length in the DDD community under the rubric of “cross aggregate rule validation”. A number of solutions have been suggested. You might get some ideas looking in to those discussions.
Here is a good place to start that discusses aggregate design and rule validations: http://dddcommunity.flywheelsites.com/library/vernon_2011/
benwilson512
You break composability.
def changeset_name(changeset, params) do
thing
|> cast(params, [:name])
|> validate_unique([:name])
end
def changeset_email(changeset, params) do
thing
|> cast(params, [:email])
|> validate_unique([:email])
end
if you do thing |> changeset_name(params) |> changeset_email(params) you’ve done 2 db requests when you could actually do it in just 1.
Ecto got rid of callbacks on the model / schema sure but that’s not what I’m talking about. I’m talking about using Ecto.Changeset — Ecto v3.14.0 which is already the spot for doing impure stuff with changeset.s
In general though in a functional language any time you take a process that’s pure and you make it impure it’s a pretty major breaking change.








