Just wanted to post about a library I published over the weekend. It’s a very simple Ecto.Type for email addresses, with automatic validations: Ecto.Email.
After looking around at a lot of email validation code as well as RFCs for email address formats, my business partner and I settled on the email_validator library. I’ve been seeing other people posting about custom ecto types in the past few weeks, and have had very good experiences with them in terms of automatically casting between unit types, enforcing validations, etc.
In this case, what goes into and out of the email type is are strings, but it gives a bit of benefit in terms of automatically applying validations during casting. I’ve found that sometimes one (i.e. me or my teammates) can forget to add validator functions when adding new changesets to existing schemas.
Hopefully this is helpful to others, even if only as a reference for creating simple ecto types.
Looks good and useful. I, too, have started shifting toward Ecto.Type usage over repeated validations in various changeset functions.
A couple of small suggestions:
I would remove the suggestion to enable the citext extension, as its use is discouraged in recent versions of PostgreSQL (since version 12). It’s probably OK for ASCII text, but if there’s any chance of Unicode text in your field, it’s not good. As unicode is supported in the address part (äddrëss@, since RFC6531 in 2012), this isn’t entirely theoretical.The recommended solutions are to either:
Use collations (I’m still working out how to do this myself). There is a useful, if Django-specific blog post that talks about this in more depth.
Perform the transformations during your casting. That is, always store the type as String.downcase/2 on the string. This could be configurable, but then you’d need to implement Ecto.ParameterizedType instead. Note that you may need to use :greek or :turkic case folding some cases.
Look at your equal?/2 implementation as it won’t work for case differences (e.g., equal?("A@B.COM", "a@b.com") should be true, but that will not happen since you’re essentially doing a byte comparison).
Decide whether you want to support punycode (domain parts, @example.com) cannot contain Unicode and have to be converted to punycode if they do). Parameterization here would be useful because it is not free (see nameprep, which requires both case-folding and NKFC normalization). See also the casting transformations noted above for this, since the address part and the domain part should be handled separately.
It may be that email_validator does not handle these correctly, but I haven’t looked at that. I’m pretty sure that the validation functions that I have previously implemented don’t handle address part Unicode at all, and I know that I don’t have support for punycode transformations on the domain part, although I did implement that as part of my regex matching at the time.
Good point with the Ecto namespace. I was mis-remembering how other libraries named their modules prior to them being incorporated into Ecto itself. I’ll ship a new version moving everything to EctoEmail.
Regarding unicode, I’ve had such positive experience with citext that I had not followed the issues surrounding it. Definitely will be looking at that, and while I might retain a reference or two to it will remove it from the example code. Also you have a very good point about the equal?/2 callback. Will incorporate that into the next version as well.