Hello!
For my first post to the forum that has helped me a great deal as I have become involved with this language and ecosystem.
I have some code that allows me to test if an email address looks like it is from a free/consumer email provider, e.g. gmail.com or hotmail.com. Following what I believe to be an idiomatic approach I wanted to use meta coding to take input from a list of nearly 5000 domains for such providers and make a module with a function head of is_free() for each… e.g.:
def is_free("gmail.com"), do: true
def is_free("hotmail.com"), do: true
...
def is_free(_), do: false
The code that I ended up with and which works is as follows:
defmodule FreemailCode do
@moduledoc """
helper to provide __using__ that generates an is_free(domain) function
for each domain listed in the input file as a being a free/consumer
email provider such as gmail.com, hotmail.com etc.
The source file is free-domains-2.csv downloaded from hubspot at:
https://knowledge.hubspot.com/forms/what-domains-are-blocked-when-using-the-forms-email-domains-to-block-feature
"""
defmacro __using__(_) do
domains = File.read!("/path/to/free-domains-2.csv")
|> String.split("\r\n")
IO.puts("Freemail loading domains")
for domain <- domains do
if domain == "" do
quote do
def is_free(_), do: false
end
else
quote do
def is_free("#{unquote(domain)}"), do: true
end
end
end
## the following simply doesn't seem to work or get added in the
## right place, always matching all domains if we do it here. Hence the if
## statement above.
# quote do
# def is_free(_something), do: false
# end
end
end
defmodule Freemail do
@moduledoc """
`is_free_email("[[user]@]domain")` will return true if given either a
complete email address or just the domain of a free/consumer email
provider.
Underlying are the module functions is_free(<email domain>) used to determine
whether an email address domain is from a free/consumer mail provider or not.
These are generated by the macro in the above FreemailCode module.
"""
use FreemailCode
def is_free_email(email_addr) do
case email_addr |> String.split("@") do
[_user, domain] -> is_free(domain)
[domain] -> is_free(domain)
end
end
end
And that’s great, but my first attempt was to avoid the need to have __using__
, after-all, I wouldn’t be injecting these functions into many modules. I just wanted to build one module. So my first attempt was:
defmodule FreemailTwo do
domains = File.read!("/path/to/free-domains-2.csv")
|> String.split("\r\n")
|> Enum.filter(fn x -> x != "" end)
IO.puts("FreemailTwo loading domains")
for domain <- domains do
quote do
def is_free("#{unquote(domain)}"), do: true
end
end
# quote do
# def is_free(_), do: false
# end
end
When compiling, this does read the domains and iterate over them - IO.puts
statements will prove this. However, if I try to import FreemailTwo and call an is_free/1
:
** (UndefinedFunctionError) function FreemailTwo.is_free/1 is undefined or private
I can require this into a module and it doesn’t work, import it… nada. I believe this is valid, and compilation doesn’t complain, but the generated functions simply don’t appear to be there. What is it that I am missing!?
The second mystery, as touched upon in the code comments above, is this: I thought I would generate all the headers for actual domains and then, in the next code section, generate a catch-all function to return false
. You can see this commented out. However, doing so also failed and I found that this function would always match. Instead I detect end of file (in this case by testing for the blank line at the end - a very ugly approach - and inserting the catch-all when I do so. This is the final working solution. It’s great - does what I want and works well. But… why was there a problem coding it in the first way!?
I guess the final question is broader: I believe that this is the idiomatic Elixir way to solve this problem of testing to see if a value is in one of many. Alternatives would be to read in or generate code to create a large list and test if candidate values are in that list. I believe this is probably slower and the general wisdom seems to be that exploiting Elixir/Erlang pattern matching is the more powerful and faster approach. The most inelegant (to my eyes) approach would be to have a DB table and query that each time (or Redis cache…). Am I going about this in the best way?
I hope this is of interest to someone. I am happy to accept that I have a working solution and move on, but would be fascinated to learn as to why my other approaches failed.
Regards
Matthew