Ecto makes a point of checking uniqueness via database constraints, not via validations. The reasoning is that if we rely on a validation, we’re open to race conditions - two simultaneous requests claim the same username, for example, because each process asks the database “is this username taken?”, each gets the answer “no”, and each then inserts it.
Ecto.Changeset has built-in validations like
validate_format which can be checked without talking to the database. If all of those pass, we attempt the insert/update. If that triggers a constraint error and we’ve told the changeset to look for it (using
unique_constraint(:username) or whatever), the error gets parsed into a friendly user message.
I get all this, and I’m glad that Ecto has great support for constraints, because it’s truly impossible for application code to guarantee uniqueness without using database constraints, table locks, serializable transactions, or some other database mechanism I don’t know about.
However, I think we should also check for uniqueness (and other ‘conflicting data in the database’ situations) in validations. My reasoning is this:
- If someone has taken the username I want, the odds are overwhelmingly large that a validation that queries for conflicting records would catch that. The other user almost certainly took that username 1 year ago, 1 day ago, or even 1 second ago. The only time the validation won’t catch it is if they took it (say) 1 millisecond ago, after my data was validated but before it was inserted. In other words, assuming they took it at some random time between when the application was created and now, the odds that that time was less than a millisecond ago are extremely small.
If I submit form data that would violate 3 validations and 2 constraints, I may have to submit it 3 times in a row to fix everything: the first time to fix the validations (when the constraints haven’t been checked yet), the second time to fix the first constraint error, and the third to fix the last constraint error. This is because while validations can tell us every problem they detect at once, databases (or at least PostgreSQL) only show the first constraint violation they detect; you have to fix that one and try the
INSERTagain to see the next constraint error. Making users submit a form where they’ve fixed all the error messages, only to present them with another round of error messages, is a bad user experience.
Therefore, if you need to ensure something like “username must be unique”, I argue that you should 1) check it in the validation phase by running a query and present the error along with things like “name can’t be blank”, and 2) also have a database constraint in case of race conditions, and use
unique_constraint(:username) in your changeset function to ensure that error is handled gracefully if it happens. (If you’re both validating uniqueness and checking a constraint, the constraint error could say “whoops, looks like someone just took that username”, since that’s the only situation where they’d see it.).
The downsides of doing this are that 1) your validation phase now requires the database, which makes testing a little worse 2) it’s not very DRY 3) it’s extra work. But 1) can be dealt with by breaking having a
run_database_dependent_validations function and testing it separately. 2) and 3) are developer pain that I think is justified by the better user experience.
Please add your rebuttals, questions, congratulations, murmurs of assent, etc.