Can anyone englighten me on how the Poison.Encoder __deriving__ macro works?


I’m finally digging into Elixir again after a few years, and I started toying around with protocols and derivations and such. I generally understand it, but the seemingly canonical example of Poison.Encoder is evading me a bit. For reference:

Specifically, I’m confused about how the deriving function works, given that the struct parameter is unused, but is also later referenced in the quote block:

defimpl Poison.Encoder, for: Any do
  alias Poison.{Encoder, EncodeError}

  defmacro __deriving__(module, struct, options) do
    deriving(module, struct, options)

  def deriving(module, _struct, options) do
    only = options[:only]
    except = options[:except]

    extractor =
      cond do
        only ->
          quote(do: Map.take(struct, unquote(only)))

        except ->
          except = [:__struct__ | except]
          quote(do: Map.drop(struct, unquote(except)))

        true ->
          quote(do: :maps.remove(:__struct__, struct))

    quote do
      defimpl Encoder, for: unquote(module) do
        def encode(struct, options) do
          Encoder.Map.encode(unquote(extractor), options)

  def encode(%{__struct__: _} = struct, options) do
    Encoder.Map.encode(Map.from_struct(struct), options)

  def encode(value, _options) do
    raise EncodeError, value: value

I understand that, within the quote block, the reference to struct is not actually a “reference” per se, and rather it just yields the AST for a variable of name struct. That’s fine… but I’m confused as to how it ends up being useful when the __deriving__ macro is called? We pass the struct along to the deriving function clearly, but it’s then unused… so what happens?

I think I might almost get it, in that my mind leans toward some fuzzy notion of “well yea, but it gets returned to… something… as the proper AST that will be evaluated as needed”… but that still feels vague and not really something I understand, assuming it’s not totally wrong. :slight_smile:

Ha! And, as usual, as soon as I bothered to type out the question I have, the answer hits me. :laughing:

For the curious (and for correction, if anyone finds that I’ve come to the wrong conclusion) - the reason I was confused is because I forgot one of the most basic things about Elixir - the last line in a function is that function’s return value. Oops.

That being the case, the situation makes more sense. The value that’s bound to the struct argument is of no consequence - we just need to keep the name identical between the two things (__deriving__ and deriving). Hence using _struct in the header for deriving - by convention, we should keep track of what we expect to be handed for this argument, even if we don’t use it. In this case that’s extra important, because we need to reference the variable name of struct consistently. Why? This is the part I think I just figured out… drumroll

The deriving function is not really returning any results of computation being done on the data that was passed in as part of the struct!

Instead, the deriving function is returning a chunk of Elixir AST that defines a protocol implementation. So what __deriving__ ultimately resolves to is a custom defimpl that handles the struct + options handed to it. Given that, there’s no need to actually know what struct contains within deriving, because all we’re doing at this point is defining what should be done to some arbitrary variable that is named struct. Once we’ve built that AST up and hand it back to the __deriving__ macro, it has the actual struct value and can therefore generate the custom defimpl based on the actual value of struct.

Clear as mud, yea?

I think there’s a better way for me to explain this, but for now I’m just excited I finally got my head wrapped around it! If I can think of a cleaner explanation I’ll share it here later. :slight_smile:

1 Like

Yes, this is the key insight. Nearly all confusion related to macros boils down to this kind of distinction. It gets particularly confusing when macros expect AST literals like keyword lists and actually DO do some kind of computation on some of those options, but fundamentally the ordinary job of a macro is not really to do computation, but to return code that does the actual thing we want to do.

The more a given macro strays from that philosophy and tries to do computations, the more of a pain it will be to maintain and use. Trust me on this one :wink: