Runtime configuration library (with casting, validation etc...) for native releases

I’ve found confex pretty pleasant to use.

If that’s working for you then that seems fine. We’re pretty happy avoiding default values in code like that wherever possible and default to always loading from the same source regardless of the “mode” the app is in. We just try to avoid a distinction between “dev”, “test”, and “prod” wherever possible. We’ve found it less confusing when things go wrong and accommodates a wider range of developer workflows. But those are our preferences. They probably aren’t going to be correct for everyone.

1 Like

I haven’t tried any tools yet (just maintaining releases.exs) - just waiting for the good patterns with built-in mix release :slight_smile:

One thing I’d like to ask: what kind of validation besides type check (e.g. parsing to integer) do you want to put in config-read time not inside the app init?

I know there are some “bad config” we don’t want to even let the whole apps start… but doesn’t it make duplicate work? Shouldn’t we leverage apps and supervision trees instead?

Also I’m curious how these tools work well with Config.Provider - or whether we need some changes in Config.Provider.

Another idea: I’m wondering we can define common configuration spec in each app module, and expose back to config tools, not vice versa. By doing that, we can enforce the config is actually being passed (e.g. avoiding an error that I add config at config tool but not using it app…)

I’ve seen people do this with Vapor. The module will define a group of providers. All of the providers are composed together and loaded on application start.

defmodule Process do
  alias Vapor.Provider.{Group, Env}
  def config do
    %Group{
      name: :process_config,
      providers: [
        %Env{bindings: [port: "PROCESS_PORT"]},
      ]
    }
  end
end

defmodule Application do
  def start(_, _) do
    config = Vapor.load!([
      Process.config(),
    ])
   children = [
      {Process, config.process_config},
    ]

    Supervisor.init(children, startegy: :one_for_one)
  end
end

This is a very interesting discussion!

I do not see the current landscape as overly fragmented. It seems like the different solutions have different design goals behind them, and exploring all of those is valuable.

Nevertheless, what I think is most valuable are discussions like these, in which we can compare approaches. :smiley:

Specify was created based on a description of Vapor, when Vapor was still vaporware (either during last year’s ElixirConf.EU or on one of the Elixir Outlaws episiodes). It took the idea of a layered stack of configuration providers from Vapor.
However, the main and more important idea behind Specify is to make it explicit what keys (having which types) you are expecting your configuration to define. Based on the configuration specification it will:

  • Automatically validate/parse the values passed in, raising errors when values are malformed.
  • Add a description of all the configuration fields to the documentation of the module it is defined in.
  • Raise errors when required fields are missing (i.e. are not defined anywhere in your configuration stack).

As such, I think that’s quite close to the “expose back to config tools, not vice versa” that @chulkilee is asking about.

2 Likes

By the way, another key idea behind Specify was to be explicit in what is read from where. It is meant to be able to be used by libraries just as much as applications, which was based on a discussion on this forum two years ago about structuring configuration (Rethinking app env).
The key idea is that a library can specify clearly what values (and types of values) are expected, as well as default ways to structure the configuration-layering. People using the library can then override this default configuration-layering (as well as the values passed at any of those layers).
The configuration-layer that always takes the most precedence is the one where we pass in values explicitly when calling YourConfigModule.load(explicit_values: %{...}, _other, _options) or YourConfigModule.load_explicit(..., _options), to allow for easy testability or per-location defaults (rather than only per-process or only globally).

@sasajuric I wonder about the choice of your new to-be-released configuration library to use functions to retrieve the configuration for two reasons:

  1. This does not allow ‘local’ overrides to the configuration. Of course, one might argue that the kinds of settings that need local overrides should not be stored in (this kind of) configuration at all. However, deciding whether that is the case for some value is then e.g. placed on the burden of a library designer, unless your library is not intended to be used by other libraries but only by top-level applications.
  2. It hides the internal details of fetching the configuration. What if fetching configuration is slow but we write code where we call one of the configuration functions many times per second? This is probably the kind of stuff that works fine in development but might break in production, where configuration might be fetched from other locations. And what if race conditions happen where we fetch two related configuration-values, but between the two reads the configuration source is updated, so we end up with one ‘old’ and one ‘new’ value?

Nevertheless, I love the idea of having clear specs (and therefore code-completion-suggestions and potentially some type-checks or other compile-time-checks) and would love to pick your brain on if there are ways to combine this while leaving the two properties I mentioned intact.


@chulkilee

I think this depends on what you are configuring.

  • wrong field names being used in the code that consumes the configuration could be catched at compile-time.
  • missing configuration should prevent the piece of code that requires that configuration to be run. If your whole application depends on something, that should prevent it from starting. If only part of it depends on it, you can still use the rest (just using normal supervision techniques; processes that require something fetch that during their startup).
  • Although rare, sometimes it makes sense to reload configuration at runtime (i.e. alter the configuration independently from your app’s release cycle). We do get close to the deeper question of ‘when can we call something configuration’?

I think this is also covered by my reply above :slight_smile:.

This is a good question. Specify predates the new Mix release changes, and I have not had time to look into how these changes might enable special interoptability.

A related question to ask, however, is in which way the two should interact:
Should we see Mix releases as a single configuration source, or should we instead see Specify/Vapor/Saša’s new tool as a source for Mix’s way to do configuration?
I currently lean towards the former, because Mix’s way of doing configuration is more rigid than what these tools can provide, but I am very interested in the opinions of @keathley and @sasajuric and anyone else on this matter.

2 Likes

I’m certain that we don’t support all the scenarios. Lib was designed specifically for the limited (but very real) set of problems which my clients face. So for example, we currently don’t support nil at all. You can provide an optional value but you have to give it a default which is not nil. I’m sure this is not enough, and I spent a bit of time thinking it through, but I don’t yet have a clear view on how to tackle it, and we don’t need it so far, so I’m still letting it simmer :slight_smile:

Not sure what you mean, but we do allow local dev/test defaults as I said before:

Is that what you have in mind, or are you talking about some other scenarios?

This is currently indeed not tackled, simply b/c we’re only using OS env, so we don’t care about it. However, there is some basic plumbing in place to make it work.

First, beyond individual getters, we also inject fetch_all which returns the complete map. This allows clients to cache the stuff however they want to. For example, you could retrieve this during app startup and pass it through the supervision tree. Alternatively, you might want to store it in some ets table, or even app config (which is after all an ets table :slight_smile:).

Admittedly, with fetch_all you lose compilation guarantees, but we still get typespecs, so that’s still something at least.

It’s also worth mentioning that fetch_all paves way for merging configurations from different sources. Basically, you can define multiple config modules, fetch all from them, and then perform a map merge.

It’s not perfect though, b/c you currently have to copy-paste the definition. I’ve deliberately coupled config definition with the adapter, b/c that’s the simplest interface that fits my client’s needs, but I’m aware that it’s not very flexible, so I’m definitely open to expanding it, though I’d like to hear the exact use cases first.

Another plumbing for faster retrieval is in the source adapter contract which looks as follows:

@callback values([Provider.param_name()]) :: [Provider.value()]

So the generic code asks the source to fetch all the params at once, which means that the adapter can make a single round trip to the underlying source, such as an external database.

We might add some internal caching logic too, but I want to defer that until the need arises. The benefit of internal caching is that we could keep functions as getters (so compile-time guarantees), and still have fast access. OTOH, caching brings a lot of complexity, so I’m a bit cautious about it.

1 Like

My current opinion on this matter is that I avoid doing anything in releases.exs if I can help it. The reason is that this is a free-form code which runs only in production, which means that it’s not easily (if at all) testable on CI. I’m not confident with writing such code, and so I like to keep as much of config as possible outside of it. Providing config during app boot has a few shortcomings, but on the plus side it brings production and dev/test much closer to each other, and so I strive to have as much config as I can in the app start.

2 Likes

Thank you for your detailed response! :smiley:

I mean: What happens if you want to use something twice in your application (or even twice within a single process), configured differently? Use cases of these are for instance:

  • Any library that depends on configuration but you might want to use two or more times in your application.
  • Single tests wanting to run in a slightly different environment from other tests (for instance to test how the app responds being configured in certain ways).
  • Abstract datatypes where we want to configure what concrete implementation of that datatype is being used for some reason (like performance vs. memory tradeoffs) in one place of our code differently from other place(s).

Then I believe you need two config values. E.g. if you want to run two Phoenix endpoints, say main site and admin, then you need e.g. public_http_port and admin_http_port, right?

I’d first attack this by having the code under test accept config as function parameters, and then provide different parameters from the corresponding tests.

This is something I’m not considering for the foreseeable future. In my view, config provisioning should be about fetching atomic values (connection params, auth tokens, pool sizes, logger leves & such). Building arbitrary complex data representation is IMO usually best left to the app code. Perhaps I’m wrong, but I definitely want to start lightweight and expand from that. That said, I’d like to hear more about concrete use cases you have in mind.

I should mention that our provider internally uses ecto changesets for type conversion, so custom complex types and arbitrary variations might be possible via ecto types. TBH I didn’t even explore this, we just use changesets because they are convenient for converting types and reporting errors :slight_smile:

1 Like

These are exactly my feelings. Our individual solutions might look slightly different. But this is the core problem.

4 Likes

Checking a .env file into git is a bad security practice. The .env should always be in the .gitignore file. Not doing so can lead to leak sensitive information in plain text to the version control system. Even if you are in control of such system, and it’s private, you should not leak the .env file outside the server running your code.

Instead you should use the .env.local with the sane defaults for development, but I would prefer to be explicit in it’s name and call it .env.dev.

Then the README for the project could have the instructions to copy the .env.dev file to .env, or if you have a setup script for development you add this step there.

That’s it, the way to go when dealing with environment values for production.

Defaults values may lead to have production running with credentials of type user: admin and password: admin , just to give the most obvious example.

Yeah, we definitely don’t want to set these to default in prod (and we don’t!). However, there are occasional examples where a prod default might make sense. For example, we want to allow connection pool to be configurable, b/c it might help an operator patch a production which is falling apart, but we still want to provide some “sensible” default. Another example could be logger level (use info by default, make it possible to reconfigure without needing to redeploy the system). But yeah, I agree that in most cases defaults are best avoided (e.g. db name, credentials, api tokens & such).

Vapor looks really great but then how do you deal with differing config based on “mode”?

Is there an example “hello phoenix” repo out there using Vapor configuring Ecto and Phoenix for dev, test, and prod? I did a github public code search for uses of :vapor in the wild and couldn’t easily find examples.

Thank you!
The solutions you mention definitely make sense, and would be what I would do first as well as long as I am the owner of both parts of the code. They break down if we are writing an application that uses a library, so we have two authors, each in charge of different parts of the code.
If you want to use the library twice, or configure it differently during testing, but the library owner has only specified a way to do ‘global configuration’, then you, the application writer, are out of luck.

Some examples would be a for instance a couple of libraries I wrote earlier that expose data structures like functional stacks/queues/vectors/priority queues. Which one of the concrete implementations to use for these abstract datatypes depends on whether you want to optimize for read efficiency, write efficiency, memory efficiency or e.g. require real-time efficiency over an amortized one. Those decisions are usually things we want to leave for the application user, even if we use the datatypes in an intermediate library, because in the end we decide at the application level whether we run the code on an embedded device or a giant VPS, which is where these trade-offs start to matter.

I actually think that what @keathley means is influenced by the fact that when you configure certain tools/frameworks/libraries your application is using, these configurations are actually static between environments. Furthermore, only hostnames and passwords are security-sensitive, things like e.g. pool sizes are not.
I expect that their setup uses a .env for the non-sensitive settings, and another set of ‘real’ environment variables that contains the sensitive fields amends these in production, with the .env.local file providing a variant of those sensitive settings (and other overrides) for development.

This is exactly what I mentioned to be a bad security practice. It should be inverted.

A .env file should never be committed into your version control, because it’s normally a place where you set sensitive data. Just take a look into all dotenv libraries across many programming languages, and you will see this pattern off using .env as a private file, aka not shared with others through git repos.

Please be open to the ideas that:

  • Most people aren’t cavalier with security concerns when made aware of them, or especially if they were already aware of them before your remarks
  • Other people’s experiences and values may be different than yours and that’s okay

Combining the two leads us to the reminder that you probably don’t need to scold a thread full of other users for a practice they’ve probably given a great deal of thought to individually as well as received feedback from other peers within their organizations about. They probably didn’t enact the technique unilaterally and against internal objections.

2 Likes

And the users, like newbies, that come here and take this examples as a good practice?