Rethinking app env

sasajuric · May 25, 2018, 10:30pm

I personally think they shouldn’t. In most cases, a library should take it’s parameters as function args (or return value of callback functions), and not enforce how this data is provided.

I would argue that using config scripts is splattering the data all over the place. For example, consider endpoint parameters which are by default sitting in four config scripts (config.exs, dev.exs, test.exs, prod.exs), but the ultimate source of truth is still endpoint module, where in init/2 that config data is further enhanced. Wouldn’t it be more consolidated if you listed all the endpoint parameters in init/2?

It’s been a really long time since I’ve worked with Rails, but my memories of its configuration and initializers are not good memories I prefer an explicit mechanism which fits well with the standard way of writing Elixir code. We already have that for our applications, and I think we could have the same for booting of the entire system.

Just call something like EtcdClient.fetch(...) or Database.fetch(...)?

In a simple scenario, there could be a wrapper module for fetching stuff from the configuration storage. Let’s call it EtcdConfig. Then other modules, such as endpoint and repo call the functions from such module.

In a more complicated scenario, you might want to cache these parameters, maybe periodically refresh them. In such cases, I’d use a dedicated ETS table (but not app env), and the implementation would become more complex, involving a couple of processes. But still, from the client’s perspective things would be simple, as client modules would just call well defined functions from the module responsible for fetching the data.

That depends on how the library accepts values.

If it accepts them as function parameters, or callback function return values, then it’s as simple as ExternalLib.some_fun(fetch_value_from_wherever(...)).

If it requires app env value but not during its boot, then it’s more complicated, but it can be handled during the client app’s start/2 callback.

If the library requires app env value during its boot, then currently we have no good solution, and need to resort to various kind of trickery.

You lost me here. What sort of situation warrants a centralized place, what would such place be, and what belongs to the config to begin with? This words gets thrown around a lot, but I haven’t seen anyone explaining when some piece of data is consider a “config”, instead of just plain function argument.

dimitarvp · May 25, 2018, 10:50pm

IMO in Docker / Kubernetes / container-tech-of-the-month scenario. However, thinking of it, there’s no problem to actually supply some kind of module attributes at deployment and just compile in place… but then people would complain they don’t want compilers in their container images. Maybe cross-compile preliminarily and just deploy the release later?

I want you to know I am fully on your side here – just trying to get a better grip at this whole situation (and I felt the official proposal thread wasn’t a good place to show my not being very well informed).

It definitely looks like everything can either be module attributes, or supervision tree init parameters, or just full pure functions.

That being said, how would you pass Ecto.Repo connection parameters only once? It’s really impractical to pass them around at every call… Probably something like YourApp.Repo.set_connection_params? But would not that mandate flushing of pools and reconnecting due to the possibility that the connection parameters might potentially now point to an entirely new database?

EDIT: months ago I found Discord’s FastGlobal library and that caught my eye as a way to have a mutable global state with extremely quick reading times. Maybe that’s one possible way?.. But then again, that would just shift the same thing from config.exs to somewhere else… Eh.

sasajuric · May 26, 2018, 6:44am

I’m focusing on Elixir code exclusively in this thread. I don’t question the fact that in a more complex system things need to be managed in some dedicated storage. A couple of years ago we used etcd to configure a heterogeneous distributed system, and that worked well for us. But these external storages are outside of the scope of what I’m discussing here. In this thread I’m questioning the need to define a bunch of arbitrary data in elixir config scripts (config.exs and friends).

Because some libraries require app env during their boot, and there’s no easy way to run custom code during the system boot, I think config scripts are currently the best place to provide such data. Other than that, my opinion is that most of the data can and should be provided in regular code.

Something like this:

# config.exs
config :my_system, MySystem.Repo, adapter: Ecto.Adapters.Postgres

# repo.ex
defmodule MySystem.Repo do
  use Ecto.Repo, otp_app: :my_system

  def init(_arg, default_ecto_opts), do: {:ok, Keyword.merge(default_ecto_opts), db_config()}

  defp db_config() do
    [
      hostname: ...,
      database: ...,
      ...
    ]
  end
end

If you don’t need to change these parameters at runtime, you don’t need fastglobal. Just define a function which returns plain constant.

jeremyjh · May 26, 2018, 1:23pm

I think this is the right answer. I do also think though, that people developing ad-hoc solutions to providing different configuration depending on the environment (test, dev, prod) is undesirable.

Let’s take Repo as an example. I am not talking here about having different host names and credentials - that is trival to solve in the environment variables. But rather differences that would belong in code such as using a sandbox pool in test, using longer timeouts in production, different pool sizes etc. Just looking at my current project and its various environments, I could easily end up with a few tens of lines of configuration code in my Repo module, which would tend to discourage it being used for anything else - so it just becomes another configuration file.

I have to have some structured method to handle these compile-time environment differences. Maybe all that needs is a config macro and best practices disseminated through the generators, but we have to have some real alternative - we can’t just say “put it in code”. Well, actually we can say that and people will get on with it but their solutions will be very divergent.

sasajuric · May 26, 2018, 5:12pm

I think that this cuts to one of the root causes why config scripts are used so much.

So to be clear, my opinion is that config scripts are way overused, and contain a lot of stuff that doesn’t belong there. I consider them a pile of bloat arbitrarily thrown together, and I think that they often make the code more difficult to understand.

IMO, there are a couple of reasons why so much data ends up in config scripts:

Some of our flagship libraries (Phoenix and Ecto) promote it.
Some libraries require it.
There’s no obvious or convenient way to specify variations between different mix envs.

To be completely honest, at my company we also overuse config scripts, precisely for the reasons stated above. So for example, even though we’re aware of if Mix.env() and how/when we can use it, if a constant varies between different envs, we usually just stuff it to config script.

I further believe that this convenience subtly creates a tendency to put even more data into config scripts. So for example, my colleague recently argued that some MFAs belong to config scripts, because they “feel” like config, even though MFA is clearly code, not config. This was for me the direct motivation to write that lengthy post about config scripts, and to actively start questioning them.

To be clear, I used them myself a lot, even though I was never quite comfortable with them, which is why I’m increasingly starting to feel that they are misleading people and causing problems much more then they actually help because:

Config scripts are not runtime friendly, and cause confusion when used with OTP releases.
Parameters are not consolidate anymore. For example, endpoint parameters are now specified in at least four config files.
A bunch of unrelated data is thrown together.

Therefore, I feel that instead of adding the additional complex machinery to address the issue number 1 (which is just one issue of config scripts), it would be much better if runtime configuration through regular code was promoted and assisted.

In particular, I’d like to see:

Official helper macros to simplify expressing small variations between mix envs in regular code. Perhaps something along the lines of this macro, or something different/better.
mix phx.new, Ecto/Phoenix docs, Elixir docs, and official getting started guides favouring runtime configuration over config scripts.

This will not solve all the problems, but I believe it will solve most of them, and that it will guide the community to write their code and libraries in a better way.

Now, some members from Elixir and Phoenix core team have mentioned that many people were further confused when init/2 callbacks were introduced. Personally, I don’t at all buy that this means that runtime config is confusing. The thing is that prior to init/2 a typical Elixir project had at least four files where parameters were specified (config, dev, test, and prod.exs). And then the fifth one was added. No wonder that people found this even more confusing.

But until we guide people to provide their parameters as much as possible in the regular code, it’s unfair to say that runtime config is confusing.

I should also state that I was not a fan of init/2 when it was proposed. Personally, I felt that endpoint and repo should just take their parameters as argument to start_link and child_spec/1. This has issues with hot code reloading, but that’s an advanced scenario anyway. My impression is that with init/2, Elixir/Ecto/Phoenix team decided to make the runtime configuration more complex to simplify advanced (and arguably infrequent) scenarios at the expense of more complex interface for everyone. My feeling is that this was not a good tradeoff. I suspect that the callback style interface of init/2 also adds to the confusion people have.

In summary, I think that Elixir/Phoenix/Ecto team historically favoured config scripts way more than runtime configs in regular code, and that’s why we’re here. Perhaps, instead of adding more complex machinery to config scripts, a better way would be to assist and promote runtime configuration through regular code and plain old passing of arguments to functions. I feel that this requires much less interventions in the language and that it can take us very far, although it will admittedly take time to move the community to such style of configuration.

stevedomin · June 9, 2018, 5:35pm

What was the issue with hot code reload in that scenario?

sasajuric · June 10, 2018, 10:35am

See here for details.

Qqwy · June 10, 2018, 12:44pm

Earlier in this thread, the question ‘what is configuration’ and the related question ‘what belongs in the configuration’ were asked.

My two cents: Configuration can be defined as ‘the subtle reshaping* of the behaviour of this instance of your application to allow it to work well in the concrete environment it will run in.’

*: (the Latin verb configure literally means ‘to mould’)

So if some behavior will be the same in all your app’s instances, it should be described in your codebase itself.

If some behaviour differs per instance, describing it in your codebase will result in code that is hard to re-use: hard-coding locations and authentication details of a database, for instance, would not make sense inside the code.

So this is where I would deliniate the difference between config/not config. In many cases, though, something that is configurable for one of your app’s dependencies is fixed for all your app’s instances,and should therefore not be part of your app’s config. But I do not believe Elixir has a way to support this: These types of config end up ‘bubbling up’ to the final user of the library/app stack. I think this might be the source of the ‘overuse’ of config files, and that it is something that could definitely be improved.

stevedomin · June 10, 2018, 4:27pm

@sasajuric thank you.

sasajuric · June 11, 2018, 5:52am

So then all the stuff in config.exs is not configuration at all? Also, all the things which are the same in dev/test/prod.exs are also not configuration? Finally, the things which are fetched from external sources (e.g. at Aircloak we fetch database parameters from a json file) are also not configuration?

This also confuses me. Config scripts are a part of the codebase.

Qqwy · June 11, 2018, 6:05am

Indeed, I think that many of those are not really configuration at all, unless you think that it might be changed for all or some of your app’s instances in the near future (which could be called different instances of the same app). What I mean with (not) ‘in your cosebase’ is that not all configuration, especially not your production configuration, is part of your source code repository (you do not copy them over using git or similar tools). Also, configuration files always lives at the peripheral of your system, i.e. outside of the ‘lib’ folder for Elixir systems.

Qqwy · June 11, 2018, 9:14pm

Let me elaborate a little more, since I finally am back on the computer rather than on my phone:

Files like config.exs contain configuration that might be applicable to all your app instances today, but:

It provides default values that are overridden by more specific configuration settings in other configuration files. How much of this is applicable vs. how much of this should just happen in code is subjective; writing the default values here is more explicit (so it is easier to find the default value), but if adding a specific configuration setting is the only way to make a dependency library work, then I’d say the dependency is more leaky than it should be.
It provides values that might differ between the app today and the app tomorrow (i.e. the near future). Examples of these might be the logger level or network connection strategies of a Nerves system, the amount of encryption rounds to use for your PKBDF2 (which you want to increase over time as hardware becomes better) etc.
When there is no sensible default because the library does not care (the chosen value depends on the environment of the application (the (type of) system the app runs on) rather than the implementation of the application, but a choice has to be made. Settings like the Erlang Cookie, the Phoenix Endpoint Secret Key Base, the folder into which attachment files should be uploaded, etc. fall into this category.

There is obviously some overlap between (1), (2) and (3) in certain cases.

In this way, what is inside your config.exs is configuration. And what is in your dev/test/prod.exs is configuration. And the database settings you receive from an external location are definitely configuration because you probably are not doing this in the development- or testing-instances of your application.

An example of something that I think is ill-suited for specifying in configuration, and rather should be part of your application’s real code itself, would for instance be what locales you’d like to use for CLDR. This is a very interesting project, but the way you specify what locales your app requires is by adding this to your configuration. They even include a special ‘compiler’ so changes to its configuration are picked up as ‘code changes’. For the locales, I think it would make more sense if inside a module that would like to use CLDR, use CLDR, locales: [:en, :nl, :de] would be used, with the implementation of CLDR’s __using__-macro gathering the required locales in that way.

Another problem over the high amount of settings that libraries shift off to configuration files, is that configuration is effectively global for your application, so using the library in one ‘configuration’ for one part of your OS-app (like one of your BEAM apps) and a different one for another becomes impossible.

There are some other libraries I’ve used in the past that just don’t do anything, or even throw errors at app launch unless you’ve added certain snippets to your configuration, although I am currently not able to remember them (I also believe I did not end up using them exactly because this).

I like libraries that allow an (OS-)app-wide global setting with module-local overrides (by e.g. using the use Foo, config: ... syntax or similar). I also like what Decimal does with the Decimal.with_context(fn ->... end) construct to (temporarily) alter rounding behaviour.

I hope this gives some more context .

sasajuric · June 12, 2018, 5:16am

So then this stuff, injected by phx.new doesn’t belong here at all?

config :my_site, MySiteWeb.Endpoint,
  # ...
  render_errors: [view: MySiteWeb.ErrorView, accepts: ~w(html json)],
  pubsub: [name: MySite.PubSub, adapter: Phoenix.PubSub.PG2]

How exactly is it easier to find the default value in a config script? I need to first know that the value is provided by config scripts (since in fact many values are not defined in config scripts), and I also need to know its name. So it looks like I still need to consult the code first, possibly also read a library documentation, before I can find the value.

How do I determine such values? The examples you’ve mentioned are the things which IME change maybe once or twice in a period of few years.

I’ve just checked the git log of one of our project’s config folder. In the past year we had some changes there. Most of them were additions of new properties, some of them were deletions, and I was able to find only one case where some value has actually been changed. And that change was due to a Phoenix upgrade, not due to “reconfiguration”. Our config scripts were more frequently modified due to making comments, than due to changing a value of some “configuration”.

We actually had more frequent changes of values typically provided in the regular code (e.g. Supervisor parameters, GenServer timeouts) than “config” values.

Perhaps we completely failed in organizing our configuration?

I’m confused here. The examples you mentioned don’t end up in app env (Erlang cookie is a VM arg, not provided by a config script), or don’t need to be in app env (secret key base).

I’m not much wiser The only clear rules I can make so far are:

Use config script if required by library
Use config script if values change between different mix env

The first reason can’t be avoided (although opening up an issue with the library might help).

The second reason deserves to be questioned. Given that Elixir has other means of making a decision based on mix env, why are config scripts the best approach?

kip · June 12, 2018, 6:14am

@Qqwy your point on cldr is very well taken and for Version 2 that is exactly the approach I am taking. Its much cleaner - but it took me a lot of learning to figure that out.

I am still a little uncomfortable with the complexity associated with creating user-defined module(s) to host the compile-time generated functions when the public API generated has a large surface areas as cldr does - but thats for another topic.

StefanHoutzager · June 12, 2018, 8:18am

Some quick thoughts. A lot of config would be easier to maintain when it resides a database, via a gui. Access can be authorised, you could version the records. Your businesslogic should have no knowledge where the config resides, so you could get the config via a call get_config(keys) to a separate application (the data access layer) that knows ecto (and config files). In the dataaccess layer is decided from which store to get the config. Env type, version etc can be taken into account there. Not thought about it yet, but maybe a rules engine could help making things flexible also http://www.myti.it/blog/2015/5/12/how-to-develop-a-product-configurator-software-using-a-rule-engine, it is not difficult to build the backend in elixir. I built one for one of the rulestypes and use the opensource frontend from Camunda (a DMN editor).

Qqwy · June 12, 2018, 8:42am

sasajuric:

So then this stuff, injected by phx.new doesn’t belong here at all?

config :my_site, MySiteWeb.Endpoint,
  # ...
  render_errors: [view: MySiteWeb.ErrorView, accepts: ~w(html json)],
  pubsub: [name: MySite.PubSub, adapter: Phoenix.PubSub.PG2]

I indeed think it would make more sense if this were to be part of the MySiteWeb.Endpoint module.

Yes, you are right. It probably is a very weak argument. You really do need to read through the configuration file to find out what happens, and have to be able to understand how the data types that are written there correspond with actual behaviour, so digging through documentation is required. So you are right, this is not a good reason.

Well, logging is something I change more often (temporarily changing it to a lower log level and later back) to check in more detail when something goes wrong. There is no ‘failure’ here because the world is not that black and white. Network connection strategies are something that change based on the physical location where a Nerves device will be used. PKBDF2 is something that has not been changed so far since our application has not been running that long in production, but at some point it will.
I think, rather than creating a very clear definition of near (which would be very subjective and not that useful), it is probably more important to think about cohesion/coupling of your settings w.r.t. the implementation inside the application: If there are multiple related behaviours your app-part (like BEAM-app or library) might provide, then putting these on the outside of the library rather than hard-coding them means that the app-part is more flexible in a wide range of environments.

Ahaha! I think we are getting somewhere here! It seems you are talking specifically about configuration script *.exs files, while I am talking about the concept of configuration in general (including vm.args scripts, loading settings from system environment variables etc.). Maybe that explains some of the confusion.

This indeed is a difficult question, and I fully agree with you that we should question this, since config scripts are by no means to be considered ‘holy’. I currently am of the opinion that they are not necessarily bad, but over-used (and that configuration in general might be over-used in some Elixir-apps, which was the reason behind the couple of posts I wrote so far).
I definitely agree that looking into different ways to make environment-based choices is a good idea.

So maybe better guidelines (let’s not call them ‘rules’ because it is highly dangerous to claim universal applicability):

Use config scripts if required by library (which we might try to avoid by releasing a new version of the library that does not depend on this).
Put values that change between individual app instances inside the configuration (whether in config scripts or somewhere else, but I think this is a prime candidate to do put inside a config script). Examples: node name, server name, local database connection info.
Put values that change between environments somewhere in configuration, but not necessarily inside your config scripts.
Put values that are likely to change somewhere in your configuration (rather than hard-coding them in a way that might result in strong coupling), but not necessarily in your config scripts.

camcaine · June 12, 2018, 9:07am

Wow I’m pretty new to elixir and I have been following the conversation around this intently.
It’s really great to hear from more the more experienced devs on how they approach this.

For what it’s worth I also find it very confusing now to decide what should go in config files.

I have come across situations specifically with plugs, where some configuration get’s passed to init/1 at compile time, and other config that gets pushed into config files. For example:

plug :authorization, strategy: :token

...

config :authorization, token_lifetime: 3600

I see this split as the :token_lifetime being something that might be changed at runtime in the app env, or different in a different env like test. But it still means you are looking in 2 different places for the config or your authorization.

Maybe one size will never fit all. But it would be great to see more best practises/thoughts.

Qqwy · June 12, 2018, 1:14pm

While I think that configuration databases might be applicable for some applications, there are a couple of important drawbacks:

It is a single point of failure for your application instances.
When should an application instance read from there? How does an application instance know when its values are changed?

blatyo · June 12, 2018, 1:34pm

In my case, this is actually something that varies between environments. I use redis for prod, but PG2 for dev and test so that the redis dependency isn’t necessary. It’s important to consider that even if your particular usage of a libraries configuration does not vary, someone else’s might. I think it’s important for a library to consider the more general case of how configuration may vary across everyone’s use cases.

sasajuric · June 12, 2018, 2:31pm

Actually, just because this happens to be configuration for you, that doesn’t mean it’s configuration for everyone. You see, given enough projects, there are all sorts of scenarios where various unlikely things end up being configuration (e.g. in our case, the database is configurable). So if we move from “it’s configurable to me” to “it’s therefore configurable for everyone”, we’ll just end up stuffing all the parameters into config scripts. No constant value would ever exist in the regular code. Good luck maintaining such code

The approach I’m arguing for in my article, and in other discussions, is to treat all of the parameters first and foremost as parameters, not some mythical configuration which has to be placed into some different place, distant from the code which uses it.

Then, you promote parameters into configuration when you actually have that need. Hence, if you want to vary the pubsub adapter, you explicitly make it configuration. With such approach your configuration grows organically, and consists of the stuff which is in fact configurable in your project.

The library does that by accepting parameters via functions or callbacks. It doesn’t have to do the guesswork and promote random things as configuration upfront. As a developer of your system, you’re way more familiar about its particularities than any library author. Hence the decision about what is configurable in your system should be left to you.

It’s a side-discussion, but since you mentioned it - if this is the only reason, I’m not sure it’s a good trade-off. You’re creating a distance between the actual version and the one you develop and test against, making it more likely to miss some issue until it hits the production. The mentioned benefit doesn’t seem very substantial. I haven’t worked with Redis in many years, but when I did, the installation was trivial, so somehow I doubt it’s complicated today. With a little bit of automation, e.g. using docker, you can simplify it even further.