Unexpected behaviours when generating functions through metacoding

Hello!

For my first post to the forum that has helped me a great deal as I have become involved with this language and ecosystem.

I have some code that allows me to test if an email address looks like it is from a free/consumer email provider, e.g. gmail.com or hotmail.com. Following what I believe to be an idiomatic approach I wanted to use meta coding to take input from a list of nearly 5000 domains for such providers and make a module with a function head of is_free() for each… e.g.:

def is_free("gmail.com"), do: true
def is_free("hotmail.com"), do: true
...
def is_free(_), do: false

The code that I ended up with and which works is as follows:

defmodule FreemailCode do
  @moduledoc """
  helper to provide __using__ that generates an is_free(domain) function 
  for each domain listed in the input file as a being a free/consumer 
  email provider such as gmail.com, hotmail.com etc.

  The source file is free-domains-2.csv downloaded from hubspot at: 
  https://knowledge.hubspot.com/forms/what-domains-are-blocked-when-using-the-forms-email-domains-to-block-feature
  """
  defmacro __using__(_) do
    domains = File.read!("/path/to/free-domains-2.csv")
      |> String.split("\r\n")

    IO.puts("Freemail loading domains")
    for domain <- domains do
      if domain == "" do
        quote do
          def is_free(_), do: false
        end
      else
        quote do
          def is_free("#{unquote(domain)}"), do: true
        end
      end
    end

    ## the following simply doesn't seem to work or get added in the 
    ## right place, always matching all domains if we do it here. Hence the if 
    ## statement above.
    # quote do
    #   def is_free(_something), do: false
    # end
  end
end

defmodule Freemail do
  @moduledoc """
  `is_free_email("[[user]@]domain")` will return true if given either a
  complete email address or just the domain of a free/consumer email 
  provider.

  Underlying are the module functions is_free(<email domain>) used to determine
  whether an email address domain is from a free/consumer mail provider or not.
  These are generated by the macro in the above FreemailCode module.
  """
  use FreemailCode

  def is_free_email(email_addr) do
    case email_addr |> String.split("@") do
      [_user, domain] -> is_free(domain)
      [domain]        -> is_free(domain)
    end
  end
end

And that’s great, but my first attempt was to avoid the need to have __using__, after-all, I wouldn’t be injecting these functions into many modules. I just wanted to build one module. So my first attempt was:

defmodule FreemailTwo do
  domains = File.read!("/path/to/free-domains-2.csv")
    |> String.split("\r\n")
    |> Enum.filter(fn x -> x != "" end)

  IO.puts("FreemailTwo loading domains")
  for domain <- domains do
    quote do
      def is_free("#{unquote(domain)}"), do: true
    end
  end
  # quote do
  #   def is_free(_), do: false
  # end
end

When compiling, this does read the domains and iterate over them - IO.puts statements will prove this. However, if I try to import FreemailTwo and call an is_free/1:

** (UndefinedFunctionError) function FreemailTwo.is_free/1 is undefined or private                                            

I can require this into a module and it doesn’t work, import it… nada. I believe this is valid, and compilation doesn’t complain, but the generated functions simply don’t appear to be there. What is it that I am missing!?

The second mystery, as touched upon in the code comments above, is this: I thought I would generate all the headers for actual domains and then, in the next code section, generate a catch-all function to return false. You can see this commented out. However, doing so also failed and I found that this function would always match. Instead I detect end of file (in this case by testing for the blank line at the end - a very ugly approach - and inserting the catch-all when I do so. This is the final working solution. It’s great - does what I want and works well. But… why was there a problem coding it in the first way!?

I guess the final question is broader: I believe that this is the idiomatic Elixir way to solve this problem of testing to see if a value is in one of many. Alternatives would be to read in or generate code to create a large list and test if candidate values are in that list. I believe this is probably slower and the general wisdom seems to be that exploiting Elixir/Erlang pattern matching is the more powerful and faster approach. The most inelegant (to my eyes) approach would be to have a DB table and query that each time (or Redis cache…). Am I going about this in the best way?

I hope this is of interest to someone. I am happy to accept that I have a working solution and move on, but would be fascinated to learn as to why my other approaches failed.

Regards
Matthew

A macro needs to return an AST - writing code like:

for x <- xs do
  quote do
    ...
  end
end

works when it’s the LAST expression in the function because the result is a list of AST fragments (the output of quote).

Adding another piece of code to the macro causes the return value of for to be discarded, the same way that this:

def foo do
  4 + 5
  :ok
end

discards the return value of +.

You’ll need to instead combine the results of the for loop and the quote block for the default case, into a list etc.


This code has the opposite problem - the quote isn’t needed when writing in a module definition. Using it results in the same thing as writing a tuple at the top-level:

defmodule Foo do
  {:def, [context: Elixir, import: Kernel], [{:is_foo, [context: Elixir], ["a"]}, [do: true]]}
end

which compiles but doesn’t do anything.

For the in-a-module-definition case, you can use unquote directly:

defmodule Foo do
  for x <- ~w(a b c) do
    def is_foo(unquote(x)), do: true
  end
  def is_foo(_), do: false
end

One additional note: check out the @external_resource module attribute, for making sure FreemailTwo always gets recompiled when the CSV changes.

:man_facepalming: - so obvious when you point it out - thank you very much for your thorough reply!