Minimising compile time dependencies - module attributes

timgent · July 7, 2022, 8:50pm

Hi folks, we have a large Phoenix application and compile times can be a bit of a frustration. Frequently changing one file will cause a number of others to recompile which can be slow. I was hoping to seek advice on how to minimise compile-time dependencies, especially with regards to module attributes. Imagine this setup:

Module A has some functions defining constants, for example strings defining the state of a user account (active, banned, pending, etc)
Module B has functions that take one of these as an argument, and we want to pattern match on the value in the function headers
We couldn’t just call the functions from module A as you can’t do that in the function header
So we have module attributes in Module B that are set using the functions from Module A, so then we can use those module attributes for pattern matching in our function headers

e.g.

defmodule B do 
  @active A.active()
  @banned A.banned()

  def show_content(@active), do: ....
  def show_content(@banned), do: ....
end

My question is, is it reasonable to try and minimise module attributes calling out to other functions in order to minimise compile time dependencies? I assume you’re then paying a greater runtime cost, but presumably not really noteworthy. And is there then any way to pattern match in the header, or would you just have to use a cond for that? Or should we just continue with module attributes and accept the increased compile time dependencies and slower compiles? Thoughts appreciated!

BartOtten · July 7, 2022, 9:54pm

What is the benefit you are trying to gain? If we know, we might have better fitting solutions.

And maybe even more important: what is the scale we are talking about? Premature optimization has ended quite some of my first projects due to the increased complexity….

Example: one of my hobby projects happily stayed snappy while the heavy loadfarm of the company I work for tried to smash it. However, the complexity of the whole thing was too much for a one-man-project and it never served a real request. The app that it was to replace was written by me in PHP4, ugly code, non-realtime but served millions of real users.

al2o3cr · July 7, 2022, 10:53pm

I see two ways to make that work, but they both also cause a compile-time dependency:

define macros instead of functions for A.active and use them like def show_content(A.active)
define guards instead of functions for A.active and use them in a when like def show_content(status) when status == active() or def show_content(status) when is_active(status) - both of these need active to be imported, though

I’d also encourage thinking about what the intent of those constants is -

do the values really change that often?
If they do, is module A really the only place that needs to change? For instance, are those values written to the database, or serialized into API responses?

The answers to those questions will drive the solution:

if the values don’t change, just use an atom. We write {:ok, something} not {ok(), something} after all
if the values change but are written into external systems, you’ve got problems. What happens when new values arrive at systems that are expecting old ones? Changing A.active to return :banana instead is the least of your problems; atoms are easier to type.
in the “middle” case, where values change frequently but don’t escape (maybe they’re produced by another process in the same application) you don’t have those problems but you could still use atoms

A wilder idea: instead of having many functions like B.show_content that pattern-match on a status, make B a dispatcher to many modules:

defmodule A do
  @status_handlers [
    "active" => B.Active,
    "banned" => B.Banned,
    # etc
  ]

  def status_handlers do
    @status_handlers
  end
end

defmodule B do
  def show_content(status) do
    status_handler(status).show_content()
  end

  # bunch more functions that take `status` as a leading argument

  defp status_handler(status) do
    Map.get(A.status_handlers(), status)
  end
end

defmodule B.Active do
  def show_content, do: ...
  # other implementations here
end

This approach can be valuable if there are many functions like show_content, as it separates them into modules by status instead of “braiding” them together with all the others when grouped by function name.

One other thought: if you decide to stick with the current approach, make sure that A has as little else in it as possible. Every file it depends on will be a transitive compile-time dependency of every file that depends on A. See also the compile-connected discussion in the mix xref docs.

cmo · July 8, 2022, 4:09am

You can create a constants module and “import” (i.e. use Module.Constants) them with the help of a __using__ macro. You can see an example in the Timex library.

timgent · July 8, 2022, 3:33pm

What is the benefit you are trying to gain? If we know, we might have better fitting solutions.

Ultimately the main thing is reducing the time for the feedback loop when doing TDD. At the moment when changing a single file I often find around 25 files need to be recompiled, taking about 18 seconds. For me I find that really slows me down as it interrupts my flow.

timgent · July 8, 2022, 3:41pm

Thanks for your suggestions! Especially love the wilder idea, interested to think about that one a bit more

Some thoughts on the other points:

if the values don’t change, just use an atom

This is the case for us. I think the reason we use a string is because these kinds of values will often come from either a DB query or from some front-end component (though not with the particular example I gave). For both those cases that means we get them as strings. I guess we could always immediately convert them to atoms, what do you think? Would that be reasonable? String.to_existing_atom/1 would need to be used though, and I discovered recently that it actually fails unless a codepath has been hit with the atom somewhere in your application already (it’s not enough that it exists in your code), so that could be a slight concern.

make sure that A has as little else in it as possible

Yep makes sense

timgent · July 8, 2022, 3:42pm

You can create a constants module and “import”

Unfortunately I understand this will still cause a compile-time dependency between the files though right? So you still get many files needing to be recompiled even when just 1 changes.

LostKobrakai · July 8, 2022, 4:28pm

Are you sure there’s nothing off with that? Almost a second per file sounds way off from what a common elixir file should take to compile on a modern system.

al2o3cr · July 8, 2022, 5:49pm

I’ve found it useful to do these conversions explicitly at the “boundary”:

from the database, use Ecto’s enums or a custom Ecto.Type to map strings to known atoms
from the frontend, either use Ecto’s casting or cast explicitly:

def status_from_frontend("active"), do: :active
def status_from_frontend("banned"), do: :banned

This is slightly repetitive, but means that you can accommodate unusual situations like “the frontend still uses a status that has changed its internal name” - status_from_frontend("old_name") can return :new_name and then nothing downstream needs to care about old_name.

timgent · July 14, 2022, 7:51pm

Good thoughts, thanks!

dimitarvp · July 28, 2022, 12:36pm

You got anywhere with this? I am curious what did you arrive at.

timgent · July 30, 2022, 5:15pm

On this in particular I think I’ve resolved that to be honest the easiest way where you need to switch based on a value from another module is to have a single function header with a cond inside, for example something like this:

defmodule B do
  def report_status(status) do
    cond do
      status == A.error() -> ...
      status == A.happy() -> ...
    end
  end
end

Rather than this:

defmodule B do
  @error A.error()
  @happy A.happy()
  def report_status(@error), do: ...
  def report_status(@happy), do: ...
end

I think the other points made in this thread are also good ones, in particular easy wins by:

If you really need a module with constants in it then try to make sure it doesn’t depend on other files
For constants consider if you can use an atom rather than a string. I still have mixed feelings about this - personally I find it really helpful to have these things easily discoverable. For example a module containing all the possible statuses makes it easier to understand and reason about, and gives you a central place to document those things. When you just hardcode atoms everywhere I personally find it a little harder to keep track of. Disclaimer though I come from a Scala background so am used to having strong typing and just using an Enumerable for these kinds of things…

Finally I’ve also started to try focussing on reducing dependency cycles, which is probably a conversation for another thread… Phoenix’s route helpers seem designed to create dependency cycles and besides something a bit hacky (like injecting them into the assigns on the controllers and so on) I’ve been a bit stumped at getting rid of some pretty gnarly cycles.

dimitarvp · August 8, 2022, 12:51am

I fully agree by the way. But if it truly helps you reduce dependency cycles… I don’t know.

I also agree with making a module with constants that does not depend on anything else. My usual approach is to just code-generate accessors / users of the constants in other modules via basic macro syntax but I suppose you didn’t find that helpful.

timgent · August 11, 2022, 9:17pm

I’m not 100% sure what you mean on this point:

My usual approach is to just code-generate accessors / users of the constants in other modules via basic macro syntax but I suppose you didn’t find that helpful.

You mean using a macro to generate the functions for accessing those constants? If so I agree it can be helpful, but I don’t think reduces the compilation dependencies? Not sure I understood correctly though!

dimitarvp · August 11, 2022, 9:28pm

Yes that’s it. IMO it doesn’t change much: it creates a one-way link which is not dangerous.

timgent · August 11, 2022, 9:56pm

Yeah agreed, not dangerous, but not too different to using a function and module attributes, though can save a little boilerplate.

timgent · August 17, 2022, 7:03pm

I’m writing a couple of blog posts on the topic to try and compile (hah!) and share some of the thoughts and ideas from this thread and other things I’ve found. First one is here - How to speed up your Elixir compile times (part 1) — understanding Elixir compilation | by Tim Gent | multiverse-tech | Aug, 2022 | Medium - grateful for any feedback on it!