On Configuration: Can we improve how Elixir libraries/applications are configured?

Qqwy · December 9, 2018, 5:19pm

Recently, at the Code BEAM Lite Amsterdam, I had very interesting conversations with @sasajuric, @michalmuskala, @voltone, @Crowdhailer and a couple of other people. Amongst the subjects was the concept of configuration, and choices to be made here.

Most importantly, @sasajuric’s talk had some interesting points:

When you decide to do something yourself (in his talk: building a Continuous Integration system), you can have a very focused implementation that does not need a ‘general purpose’ configuration layer with all bells and whistles.
Configuration just as passing arguments to functions is infinitely more flexible than config files.
In a system where everything is done in the same language, there is no difference between DevOps and Programmer.

And on the other hand, the talk by @voltone:

explained the bad experience of configuring Erlang’s built-in :ssl-library properly. (it’s default configuration is very unsafe).
mentioned that many of the HTTP(S) client libraries out there either require the programmer using them to pass on the proper :ssl-configuration, or the few that do use a secure configuration by default will completely replace it if the programmer passes in their own.

So, while about a year ago we had a topic talking about how to configure your application and libraries you are building, I think it is time to look at this subject anew.

There are roughly three ways to configure a bit of functionality (whether a library, an OTP-application or something else) in Elixir land:

Pass options as argument(s) to a function. This is the most flexible, because it allows calling the function from many places in your code with different options. Elixir makes this approach easier by having keyword-lists as first-class type.
Add configuration settings in the config/config.exs, config/some_mix_environment.exs and similar files. This makes it possible to have different configuration on different environments (for instance, on your local computer the database has a different password than on the production machine). However, it is not possible to use functionality that only can be configured like this in multiple different ways within the same environment.
Configure your application using environment variables, and use some way to read these in your application’s code. The nice part of this approach is that you can have different settings on different machines (even if they are both ‘production’). The bad part is that you currently need a special way to make sure these are inserted in your application’s configuration.

Problems with (2) `config.exs`-style configuration and (3) environment variables.

Configuration happens at build time

(2) and (3) have the extra catch that when you build your application, for instance when building a release using a tool like Distillery, the configuration is read during building rather than when ran. This means that you either need to build on the machine that will run the release, or your need to take extreme care to make sure the configuration values are still relevant.

Ad-hoc, turing complete configuration.

Furthermore, values filled in in (2) and (3) very much affect the functionality you use in an implicit and ad-hoc way: From looking at the code itself it is impossible to see how it is configured (or that it even has special configuration). Also, because config.exs-style files are essentially just normal Elixir files, they might contain a spaghetti of arbitrary Turing-complete nonsense; most commonly some files override some settings of some other files, making it more difficult to see which configuration will actually take place in the final application (without running Application.get_env manually from within the running application.)

Problems with (1) functions that have option-arguments

Now, (1) is definitely the most flexible. Most importantly, we can configure it differently in each and every one of our tests.

No standardized way of checking the current application environment

However, something that does stick out like a sore thumb is that there does not seem to be a standardized way to check inside your code in which (Mix) environment you currently are: Mix.env will not work inside a built application, because the :mix OTP application will not be included.
Instead, it’s documentation actually encourages you to use (2) to configure environment-specific configuration.

Maybe this is something that we could improve?

Options-handling via keyword-args is somewhat clunky, and could be improved

We cannot match on these in the function header, because keyword args are ‘just lists’ and therefore position-dependent: def foo(a, b, c: d, e: f) will not be triggered when someone calls foo(1,2,d: 3, c: 4).
In almost all cases, duplicate option entries are irrelevant. However, there is no way to extract a map directly from the options at the end.

So almost all option-handling code goes a bit like this:

options are passed in as keyword list
we either combine (in some sensible way) or warn/crash on duplicate entries (this is very frequently ommitted by libraries in the wild!)
these are then transformed into a map (or sometimes a special options-struct)
we fill in sensible defaults for the options that were not provided. (which sometimes depend on yet other options that were provided).

I think a library (or potentially, since it is such a common thing to do, either a ‘blessed’ library or even inclusion into Elixir’s standard lib!) that makes it easier to do properly; essentially I a thinking about something similar to Python’s argparse module (for ARGV parsing), where a specification for the options (including field names, descriptions, allowed types and defaults) can be used for parsing, warning/crashing on error as well as generating documentation of the allowed options!

Problems in general

Libraries restrict their user

Libraries currently commonly pick either (1) or (2) (sometimes with the possibility for (3)).

I think it would be a lot better if libraries were to use:

default values
overriding these per key with what you configured using (2)/(3)
overriding these per key with what you configured in (1).

Alas, possibly in part because there is not a standardized way to do this currently, many libraries either restrict you to one of these approaches, or completely their default configuration out the window once you start customizing something.
In certain cases, such as the SSL-client libraries, this can have disastrous consequences unless the user of the library is aware (the configuration behaviour of a library is almost always ‘implicit’ and not mentioned anywhere).

And related: libraries could warn if they encounter options that they do not recognize, but currently usually do not AFAIK.

So, those are my current thoughts on the matter. I think we can improve this current situation as a community. In this post, I gave some suggestions that I currently think might be worth exploring, but I am also very interested in your suggestions and opinions about this matter.

LostKobrakai · December 9, 2018, 6:42pm

I’ll just go of of one part of your post, but trust me it’ll be on topic:

I don’t think this is a bad thing at all. Imho application code should not be aware of the place it runs in. The environment just needs to have ways to configure the application. Like I don’t see a difference between changing the email adapter between doing development and running tests vs. having different smtp credentials when running an app in two separate production environments.

That’s why I don’t really feel Mix.Config is bad as it is, because it’s a way to store configuration for multiple environments you might need when developing using a mix project structure. With the addition of distillery’s config providers I feel like the config/prod.exs should mostly be empty for using releases, while dev/test are simply valid environments for mix. For prod just use a distillery provider you like for the same task Mix.Config provides for usage within mix. Mix config also doesn’t really directly configure libraries, but just put’s data into the global application environment. That’s the same thing the distillery providers do.

The problems are in my opinion twofold:

One tricky bit is compile time configuration vs. runtime configuration. For just running on mix this is easy. Mix is doing both compiling the project and running it therefore there can be one configuration setup providing both.

Distillery is not compiling the project, but takes a project compiled by mix and just adds runtime configuration via it’s providers. And this is where the issues come up, because compile time configuration still needs to be handled by mix (a.k.a. config/prod.exs), while runtime configuration should be done by any of the distillery providers. So one needs to be aware of what’s config needed at compile time and what isn’t to be able to separate them out.

The other part of the problem is the receiving side of configuration, a.k.a. the libraries:

I’m totally with you that we need more best practices and/or tutorials on how to handle runtime configuration. Currently even most runtime configuration is happening via the application environment, which makes having different configuration side by side impossible. But always passing stuff via arguments can become a pain as well. That’s where the merging of configuration comes in. For example ex_aws does a good job of that.

For compile time configuration it’s easy in terms of providing it; it happens in mix so mix config should work just fine. But library providers need to be aware what they compile into. E.g. ex_cldr recently made the switch to compiling into userland modules to be able to not have a single configuration, but multiple side by side. While when the library code itself changes based on compile time configuration this limits the library to a single set of configuration.

So I’d really like to see some improvement on the pain-points:

Have some easy to pick up best practices/library for merging runtime config from various sources: app-env and arguments being the most obvious.
Having some easy to pick up best practices/library around compiling userland modules instead of compiling configuration into library code.
Library maintainers being more forthcoming with distinguishing compile-time from runtime configuration

What would also be interesting is if we could create some generic “compile time baked config” and “cache into process config” library, which users could easily wrap around libraries, which just accept configuration at runtime as arguments. Because I feel quite often library authors use the app env or compile time config because they api feels cleaner and/or the idea of having config be more performant if it’s baked in at compile time. If we would have a generic library to move that part into user-land (and a user base for it) people might be inclined to just build their libraries receiving arguments and only reach for more integrated options if really needed.

Qqwy · December 9, 2018, 10:37pm

This is very interesting! In the end, it boils down to who is ‘in control’.
You might be right that it usually is a bad idea to have ‘environment-aware’ code, but whenever you encounter something that cannot (currently) be configured in a configuration file, you will have to do this. I agree that code that is environment-aware should live at the very outside of your system, but I do think that there are cases where this is required.

I fully agree !

I do not agree with this, because:

Either things have sensible defaults, in which case the ‘API feels clean’ by default even when nothing is placed in the /conf/*.exs configuration.
Or things require configuration because there is no sensible default, in which case the library should crash with an error if included/used without configuration; when done using conf/*.exs it will only crash when called at runtime, whereas having required arguments in the API will crash during compilation.
While I did not benchmark, it seems odd that passing settings as arguments (especially if these arguments are static and are thus compiled as inline constants as part of your code) would be slower than looking them up from a nested configuration keyword-list that is essentially global storage: I am fairly certain that they are faster.

Maybe we do need a ‘wrapping’ library, but I’m not convinced about the “compile time baked config” and “cache into process config” approach.

related to the ‘compile-time’ vs ‘run-time’ configuration problems: What about selective compilation? We have a wonderful platform that allows for the recompilation and hot-code-swapping of individual modules. Why can we not delay all ‘compile-time’ configuration to application startup-time? (or, to be more exact: If we want to reconfigure something on the machine the application ends up on, why not allow to recompile the required modules then?)

Maybe @michalmuskala or @ericmj or someone else who has more knowledge about e.g. Mix’ internals or Elixir’s compilation approach can shed more light on this. Is it technically impossible, infeasible or just a ton of work to make happen?

rvirding · December 9, 2018, 11:29pm

This is really an Elixir problem as in Erlang systems config files are read at start-up time. So using configuration files to set application environment variables is the standard way of configuring applications. You also have the extra option of sett in environment variables on the command line as well. Why this is not done this way in Elixir I don’t know, or has it been fixed in latest version of distillery?

LostKobrakai · December 10, 2018, 12:07am

Afaik configuration was never applied at compile time to begin with (besides compile time code changes based on said config), but before 2.0 distillery did evaluate the config.exs when creating the release and stored only the computed values in the release’s configuration files. So people doing stuff like loading configuration from external places (system env) within their config.exs would have that happen on the build machine. From my limited experience with erlang and sys.config this is not something erlang would support as well. So it was more an issue about how config.exs files were handled and less about when configuration was read / put into the application environment. Also sys.config is erlang, so people would’ve had to edit a erlang file, which is not terribly difficult, but also not optimal for a elixir project.

Distillery 2.0 added configuration providers where one can choose/implement how configuration is retrieved at startup, which is now way more flexible and gives the option of have a .exs file evaluted on app startup just like in mix.

OvermindDL1 · December 10, 2018, 5:10pm

That is the big lock-in though. Basically what I said in the old thread I still think holds true. We need to multiple stages of config’s, those that exist at compile time that act to adjust code generation and act as defaults for later stages, those that exist at start-up only, which act as defaults for later stages but otherwise should only be things that are absolutely only necessary at start-up time (things like BEAM settings, this stage is mostly just for setting default values for the runtime stage), and those that exist at runtime, which allows for dynamically changing them in a running system, application environment is this for example, however there needs to be a way for when part of the application environment is changed that a signal/message is sent to whatever registered handlers so they can update based on the new configuration changes (like increasing/decreasing the number of SQL pool workers as one example). This pattern should be ubiquitous so everything is consistent through-out the ecosystem and thus needs to be baked into Elixir itself.

This is all of course adjacent but related to ‘where’ should the configurations come from, which I also spoke of on that other thread, basically need a set of handlers to pull settings from, whether one for environment variables, one from some data store, one to pull from a database (even registering a postgres listener for example to listen to setting changes from the appropriate database table or so), xml files, json, whatever, it needs to be pluggable, Java does this very well (though antiquated compared to better ideals now) as one big example.

LostKobrakai · December 10, 2018, 5:31pm

Sure. I wanted to explain that elixir doesn’t (and didn’t) do anything different then erlang though and the difficulty lies in what wasn’t there before elixir: the compile time aspect.

OvermindDL1 · December 10, 2018, 5:32pm

Unfortunately accessing configurations to change code generation (even something as simple as setting a module attribute) is extremely ubiquitous among Elixir and Elixir libraries, so it’s not just a little thing that can be ignored most of the time. I think at one time even Ecto was holding the pool size at compile time and baking it into the code (not now thankfully), it was that ubiquitous.

sasajuric · December 10, 2018, 8:32pm

At Aircloak we wrote a small custom macro for that called in_env, which can be freely invoked at runtime:

def foo(...) do
  ...
  in_env(dev: foo, prod: bar, else: baz)
  ...
end

Where foo, bar, and baz are arbitrary expressions.

It’s not perfect, but I find it better than app env vars. The benefit of in_env is that the mix-env variation is localized, rather then elevated to the app env. I personally find it easier to understand what goes on in such code, compared to opening 3-4 additional files and manually overlaying options in my mind to figure out what are the differences between different envs.

I agree with you that some standardized (i.e. language approved) way of checking for Mix.env would be welcome.

tim2CF · December 11, 2018, 9:58am

Hi
We had a few discussions inside our company how to deal with compiletime/runtime configs and we decided to completely split these things and at the same time avoid system variables mess. This library partly solves issues you initially mentioned

Qqwy · December 15, 2018, 11:35pm

I am currently thinking about writing a library that creates a simple, standardized way to do layered configuration:

A module can have one (maybe multiple?) sets of options.
Every option can be required (by default) or optional (which requires you to specify a default value), and have a type (which by default is ‘any’).
From inside your functions, you can parse the current options by calling LayeredConfig.get_all_env(options_specification, options_passed_into_function); (there is also a LayeredConfig.get_env(options_specification, options_passed_into_function, key)). Behaviour of these two mimics Application.get_all_env/Application.get_env, with the difference that:
- options given in options_passed_into_function will override application-level options
- default values are based on the options_specification
- Errors are immediately thrown when required options are missing
- Errors are immediately thrown when unrecognized options, or options with values of the wrong type are passed. ❦
The options specification itself will be a simple list of tuples, but LayeredConfig will expose functions to distill:
- a readable version of this to be used (string-interpolated) in your function- or module-documentation.
- a type to be used in your function specs, so we can have Dialyzer checking that only proper options are used!

❦: Especially for this, it would be nice if LayeredConfig itself could be configured, for instance to only strictly enforce value-checks in development and test environments, so errors are quickly caught while production enjoys a speed improvement. As as side note, this would mean that the library would dog-food itself, which would be super cool for totally nerdy reasons!

Good idea? Suggestions?