Library for runtime application configuration—interested?

christhekeele · May 16, 2023, 4:58am

TL;DR:

I’m planning on building a library, inspired by Vapor, to make configuring Elixir applications more straight-forward; in an approachable but highly flexible way that scales to non-trivial usecases, even supporting no-redeploy modification of runtime configuration values in distributed, hot-code-reloaded systems. Interested?

Summary

I know there are quite a few options out there, and a lot of recent developments in this space the last few years, but I’ve still found building a good runtime configuration story in Elixir for large 12-factor apps to be a bit of a pain.

The problem is partially permutative: over time, an application can grow to want multiple sources of runtime configuration; loaded differently at compile-time, boot-time, or runtime; differently in different build environments and targets; with support for different configuration file types; and with different approaches for overriding values during development and testing. This gets harder and harder to reason about without good developer tooling.

Add in the ability to modify these values at runtime, in distributed systems, that supports hot-code reloading, and the problem becomes nearly intractable. I’d like to tract it.

Backstory

I’ve been porting a personal Elixir solution from project to project over the last 5 years, starting from the excellent Vapor library, pretty much since the day it was released. As my pet approach has evolved, I’m pretty happy with it, but would love to polish it—and I’m tired of copy-pasting my own code again and again.

Some of this approach was stolen from my Ruby on Rails days, where I was equally dissatisfied with the situation in that ecosystem, and developed a similar personal non-open-sourced solution that worked well with Rails’ boot system and Ruby’s dynamacism, around which I built the Inquisitive gem for even more Ruby syntax-sugary ways of interacting with runtime configuration, first deployed in production applications around a decade ago.

Elixir, as a less-dynamic-than-Ruby, compiled language with a (historically driven by erlang release mechanisms) mostly build-time configuration story, has a harder-to-engineer story around runtime configuration. There’s been a lot of improvements to this in the first decade of Elixir, but there are still pain points.

I’m planning on codifying my Elixir approach to this problem in a package anyways for personal convenience, but I’m curious if there’s wider interest in the community, and would like to solicit ideas for a feature roadmap that might gel with what I’m building!

Synopsis

What I’m developing is essentially a declarative way to define your runtime configuration, and integrate it into your project at any point in your application’s development lifecycle.

Vapor’s example shows you how to throw it in to your Application.start/2; but I unerringly find myself wrapping that in complex conditionals and OTP conveniences as the development, deployment, and override configuration-sophistication needs of my projects increases; always re-evolving the implementation towards the same result.

I figured it might be beneficial to encode my approach as a library, and make it easy for folks other than myself to re-use. Here’s what I have in mind:

Core Features

These are aspects of this system I’ve actually built before, and would love to stop re-inventing:

Support multiple approaches to defining configuration and sources:
- A straightforward config.exs-driven approach.
- An inline-Supervisor-tree-driven approach (including, your main OTP Application supervisor, as the Vapor docs guide you towards).
- Perhaps a module-driven approach DSL approach
Config env and target aware filters in configuration plans:

To make it easier to describe a complicated permutation of sources for configuration values in different build environments and targets.

For example: loading from .env.#{Config.config_env()}-type files, but never when deployed to :prod, where the app should rely exclusively on environment variables. Or, looking into a .gitignored .env.local configuration file for ultimate overrides, but never doing so outside MIX_ENV=dev or MIX_TARGET=local situation.

Specifically, instead of repeating configuration in different config/#{Config.config_env()}.exs files, allowing a single source of truth entry in your main config.exs file, with filters attached (similar to your mix.exs deps() :env and :targets filters). This makes it much easier to reason about where your runtime configuration comes from in your build-time configuration.

Or, describing a single list of configuration providers in your Application.start/2 callback, instead of incrementally building a list with many providers = if Config.config_env() == desired_env, do: modify_providers_for_this_permutation(providers) calls
A handful of trivial out-of-the-box mappers for common config coercions:

Ex: modeling string-only env vars as booleans, ints, or floats at runtime.
A validation system for ensuring values are within required parameters:

Ex: ensuring that your database pool size is always greater than 1 in production.
Very specific error messages when required configuration values are missing or cannot parse:

Including a lineage of all config sources that attempted to provide a value.
A Mix task for ensuring configuration is loaded appropriately for other mix tasks:

Necessary when your config loading is done externally to your Application.start call.

For example, if your Ecto Repo uses the init/2 callback to configure itself dynamically at runtime, mix ecto... will not work without a little help in accessing config not loaded in your main application callback, if it is provided by libraries such as these.
A Mix task for easy introspection of the current configuration given the current env/target, including lineage of overrides from different configuration sources.
Logger output at Application startup about configuration values:

To make it trivial to understand in your logs the way in which your application was configured at launch, including override lineage.
Secret-awareness to prevent sensitive things from being logged or displayed in Mix tasks.

Aspirational Features

Features for this system I’ve never implemented before, but believe I can build it to support, with enough motivation:

An extensible system of declaring configuration value parser Vapor “mapping” functions:
- Working around restrictions in referencing anonymous functions in a config.exs, to support all usage modes.
  
  Generally by the time I find this need, I’ve moved configuration over into my Application.start/2 callback where they are already available, but an out-of-the-box solution that supports config.exs configuration as a first-class citizen must accommodate this.
Test helpers, to make overwriting runtime config during a test (and other mocking of configuration values) trivial without fully losing parallelization of said tests:

Regardless of configuration value providence, like an environment variable that is hard to modify mid-test-suite, this would let you play nicely with the virtuous properties of ExUnit.
Per-process caching of commonly fetched config values in the process dictionary for hot paths and tight loops (since the initial library plan currently throws everything into :ets for retrieval at runtime each time it is referenced, and would only grow less performant with distributed-friendly alternative implementations of the configuration backend).
Swappable backends over :ets to extend the configuration value storage mechanism to more distributed-friendly environments, once I am convinced we can optimize this scenario for hot paths and updates. For example, any Ecto-supported adapter, or persistent_term for scenarios where configuration is rarely intended to be changed.
Sane support for changing runtime configuration values at runtime, even with distributed backends, so that it is viable to do so via a remote connection to a production system:
- With a Pub-Sub system for configuration value consumers to be notified when this occurs.
- And supervision tree helpers subscribed to that to make restarting when certain values change trivial, ex:
```
 [
   {Library.Configuration.Dependency, values: [:SECRET_KEY_BASE, :SECRET_SALT]},
   MyApp.Endpoint
 ] |> Supervisor.start_link(strategy: :rest_for_one, name: MyApp.WebSupervisor)
```
  or even
```
[
  {
    Library.Configuration.Watcher,
    values: [:SECRET_KEY_BASE, :SECRET_SALT]},
    children: [MyApp.Endpoint] ,
    name: MyApp.WebSupervisor
  },
  Other.Things
] |> MyApp.Supervisor.start_link(strategy: :one_for_one)
```
  letting you literally connect to a running production application and rotate your secret keys, at runtime, with zero downtime outside of your Supervisors restarting things.
  
  Or more generally, modifying any configuration in a production system at runtime that you’ve decided to make runtime configuration, with OTP supervision tree resiliency guarantees about the consequences.

Call for feedback, criticism, and ideas

Does any of this excite you, or feel like it might solve a pain point in the projects you work on? Let me know!

Or, do you maintain a complicated and large 12-factor app, and this still seems over-engineered and unrealistically overblown—would you loathe working in a system configured this way?

Finally, this is all conceived from my own personal experience, needs, and observing those of others here on this forum. Do you have any other insights from your experience you think would be instructive during the initial development of such a library?

Thanks for reading! Let me know your thoughts!

D4no0 · May 16, 2023, 6:26am

I have been working on a cluster of projects that were part of a big solution. At some point we wanted to make a admin interface that would be able to dynamically configure each individual service, so I was thinking about making such a library too, with focus of fetching configuration from database, but didn’t have enough time at that moment and you also have to take into consideration how to handle configs that change at runtime correctly.

christhekeele · May 16, 2023, 6:33am

This is one reason why I want to initially reach for a stlib-oriented control plane for the configuration storage solution (:ets), so that it’s as universal as possible to build UI solutions for!

I’m not yet convinced that my initial naive :ets-driven approach will suffice for multi-node distributions, though I have some ideas in that regard, which is why such modification features are marked as aspirational.

D4no0 · May 16, 2023, 6:36am

Sounds very promising, my only 5 cents is that I think it would be wise to make this configuration type as separate as possible from the classic elixir configuration, as to not introduce more complexity into the runtime and compile-time configuration bucket.

christhekeele · May 16, 2023, 6:43am

Agreed. My preference over time has been to relegate config.exs files exclusively to compile-time config, and placing all runtime-config closer to the Application.start/2 callback boot constructs, dodging the nuances between config/runtime.exs and others completely. It’s nice not having to have a runtime.exs file outside of Nerves projects, makes it much more intuitive to reason about!

But, I would like a full-solution library to support an ease-of-installation-accommodating the config.exs approach for convenience in new projects, which requires some work-arounds with what’s possible with Vapor, making it even more useful to abstract behind a library!

Exadra37 · May 16, 2023, 8:55am

A solution for a distributed database for the configuration may be LiteFS - Distributed SQLite:

LiteFS is a distributed file system that transparently replicates SQLite databases. This lets you run your application like it’s running against a local on-disk SQLite database but behind the scenes the database is replicated to all the nodes in your cluster. This lets you run your database right next to your application on the edge.

christhekeele · May 16, 2023, 9:29am

I’ve been really excited for the renaissance of SQLite in non-mobile, distributed production deployments, for exactly this sort of use-case, and fly.io is really supportive for this type of tech right now!

I’ll admit, I am leery of building this sort of library (initially) around a backend-storage swappable-adapter model (partially because of my experience trying to do so non-trivially with Mnemonix); just because of hot-path performance implications in the domain of configuration value reading. I stopped developing Mnemonix when I drew some flamegraphs around my first real-world applications using it, and read the writing on the wall about how my OTP-driven adapter architecture would throttle meaningful performance within the correct level of abstraction.

However, I agree that such an architecture would open up the doors to many distributed system setups! One more reason why I want to tackle this as a library instead of a repeated copy-paste hack: so I can properly encode the correct level of abstraction for this domain. In my analysis, the requirements of a ready-heavy runtime-configuration-reader library with good event-driven cache-busting is far more amenable to optimization than Mnemonix ever could have been.

My long-form aspirations here are to:

Get things working first (via :ets).
Provide a solution for hot paths next (via process-level caching and event-driven cache-busting).
Not mentioned in my initial roadmap, finally return to the codebase with my experience from Mnemonix and Elixir library development since then:
- Specifically to support this anticipated abstraction of the storage level, when I’m confident that hot paths have a way to keep up.

I’m pretty delighted that LiteFS, Ecto.Adapters.SQLite3, and Etso have converged to a point of maturity around the same time, honestly. Between that, and some of @lawik’s recent analysis of distributed PG ↔ SQLite synchronization tools that would enable this library to work in a distributed fashion for feature flags—well, it’s just an exciting time to be an Elixir developer, and that’s a large part of what’s been making me itch to encode this as a robust library!

christhekeele · May 16, 2023, 9:55am

@Exadra37 I’ve added support for an alternative configuration backend over :ets, explicitly mentioning support for distributed environments, to the post’s Aspirational goals, because you’ve convinced me that this could be a major addition to the project, outside of its Core MVP goals!

Exadra37 · May 16, 2023, 11:23am

Coming from a dynamic language background the Elixir configuration was a pain to grasp and remember each time I came back to Elixir, worst when I started to deploy my pet apps, thus I really welcome a library that can make it easy to work with configuration and not hard to remember when returning back to the project after a while away.

As a developer advocate for security I don’t recommend at all that releases are built with any type of secrets on them, has we usually do now, with the session salt and session encryption key being a good example of some not being easy/possible to retrieve only at boot-time. It would be nice you could solve this problem with you configuration library.`

Maybe you want to keep an eye on Castle and/or work with them to be compatible with how configuration works with Hot Code Upgrades:

For example, to be compatible with sys.config :

runtime support for sys.config generation (incl. support for runtime.exs)

dimitarvp · May 16, 2023, 11:47am

Same btw, and every time I had to configure :cowboy SSL I made a mistake. Configurations are not strongly typed nor enforced so any small mistake you only find out in runtime. Really started being a thorn in my butt for some time now.

I am pondering a different (smaller) library that wraps various common configurations in strongly-typed structs with clear rules which key must exist and when (f.ex. if you have one key present then two others are unnecessary, or if you put one optional key in then 3 others become mandatory because all 4 together must configure a certain aspect etc.) – and then they’ll translate these structs to the underlying mish-mash of [keyword] lists and tuples.

Would you have interest in that?

I am not even sure I’ll come back to work for an Elixir company, though I have started getting offers lately.

But if I don’t go all-in with Rust and do remain with Elixir on a part- or full-time job capacity then I very likely might end up writing such a library, just out of frustration.

cmo · May 16, 2023, 12:04pm

If you’re concerned about ETS being slow, have you considered using persistent_term?

ausimian · May 16, 2023, 12:26pm

Maybe you want to keep an eye on Castle and/or work with them to be compatible with how configuration works with Hot Code Upgrades

sys.config generation is really the main thing Castle does, both at boot time and (just prior to) hot-upgrade time. This satisfies the erlang release handler. Today, that generation only supports runtime.exs but more general support for Config Providers will be added shortly. As long as other providers implement the Config.Provider behaviour, Castle will be able to call them.

That isn’t the tricky bit tho - the tricky bit is correctly implementing the config_change callback in your application.

christhekeele · May 17, 2023, 6:26pm

I’m not too concerned about ETS being slow! And persistent_term would be a better fit for consumers who intend to not update runtime configuration that much, so it makes to support as an alternative backend. Will add to the candidate list for later feature development.

Exposing per-process caching in the process dictionary is more something I think I’ll be implementing anyways for library internals, so might choose to expose as a feature. In order to develop test helpers that let you run tests concurrently, but override some values for specific tests, I need a mechanism to let individual test processes access the override safely without modifying global configuration, and was thinking about using a read-through cache from process dict to configuration singleton.

If I don’t make that read-through cache only happen in test environments, but make that how the library does all lookups, then it’s trivial to expose what I suspect is the fastest possible way to optimize access to config values —so might as well expose if the implementation seems sound and compatible with other internals.

christhekeele · May 17, 2023, 6:33pm

While I used Ecto database configuration as my initial example, I have used this tech specifically to have my session salts and encryption keys be runtime-only! They’re why I began looking into sane mechanisms for making updating such runtime configuration easy, as a zero-deployment solution to rotate the session encryption key in the event of a breach, and as a way to modify the session salt to force site-wide logout for everyone.

In fact this is a much nicer example, so I’ve updated to use it accordingly.

So yes, this library should solve this problem, as it already has before! IIRC it took a little more finagling then one’d like, but as this blog post about using vapor describes, both Ecto.Repo and Phoenix.Endpoint support init/1 callbacks now so either can be used with runtime config!

hst337 · May 17, 2023, 7:09pm

That’s a great description, but in my opinion you’re trying to solve problem which is not actually that important for the ecosystem. During my engineering experience, I’ve encountered completely different problems with runtime configuration, like

Distributed application configuration
Atomic reconfiguration (the configuration is changed in multiple places “at once”)
Configuration which has to perform some action to configure the different states

Generally speaking, I find every runtime configuration-as-a-key-value-store approach really hard to maintain, because configuring application in runtime is not a problem of changing a value, but really is a problem of propagating the changed value, and at most of the times this propagation must be as much as consistent and as atomic as possible.

OTP’s Application env is only applicable during initialization of the program, while it provides no meaningful answers or tools for runtime reconfiguration. Vapor actually solves a problem with different configuration stores and provides a DSL’s where plain Elixir could’ve been used without any implications or drawbacks.

So, if I was up to writing a runtime configuration library, I would have started with thinking about approaches to these problems.

To get an idea of what I am talking about, here’s an example:

Short example: we read the value from somewhere during the initialization and store it in the persistent_term, for faster read access in runtime. How do we change this value in runtime? Application.put_env is not working. We need hooks for reconfiguration or something like this.

Long example: we have a pool of workers around of TCP connections which are always connected to the server. They reconnect every time server drop’s the connection, send empty ACK’s, etc.

The server’s address is read from configuration (via application env, system env or vapor, it doesn’t matter) and then this address is stored in the state of each of these worker processes.

Problem: I want to reconnect to another address with the same state workers have right now. This can be done due to security reasons, or as a connection to fallback server in case of main server failing or whatever.

Existing configuration solutions provide to easy way to do this. What I actually need to do, is to perform a transaction which would suspend the workers, stop their connections, swap addresses in their states, initiate new connection and resume the workers. If something fails, I would have to rollback the actions and return to the user that the reconfiguration has failed.

Hard, right? Now imagine this pool is distributed

So, if I was up to writing a configuration library, I would start with adopting existing places where developers usually put the values they’ve read from configuration like persistent_term, ets, GenServer’s state. Next thing I would do, I would think about some configuration hook system, or something like this to have actions running when the values in the store are changed. And the last thing, I would provide some interfaces for tracking the transactional reconfigurations

hst337 · May 17, 2023, 7:12pm

That’s the problem I am talking about. Configuration values stored in pdict make them almost impossible to reconfigure in runtime with existing tools

christhekeele · May 18, 2023, 4:54am

It seems our heads are in the same place:

hst337:

That’s a great description, but in my opinion you’re trying to solve problem which is not actually that important for the ecosystem.

Configuring applications at runtime is not a problem of changing a value, but really is a problem of propagating the changed value, and at most of the times this propagation must be as much as consistent and as atomic as possible.

Application.put_env is not working. We need hooks for reconfiguration or something like this.

Hard, right? Now imagine this pool is distributed.

So, if I was up to writing a configuration library, I would start with adopting existing places where developers usually put the values they’ve read from configuration like persistent_term, ets, GenServer’s state. Next thing I would do, I would think about some configuration hook system, or something like this to have actions running when the values in the store are changed. And the last thing, I would provide some interfaces for tracking the transactional reconfigurations

I think you’re gonna love where I want to take this project, then! Admittedly, I’m targeting something smaller initially, but if you read through the aspirational goals, you’ll notice these are the problems I’m reaching towards. I think I’ve found the right level of core abstraction that is extensible to solve these problems, and want to ship that first—we’ll see how it goes from there!

christhekeele · May 18, 2023, 4:58am

Agreed. This technique is required for how I think I’ll be implementing test helpers, in a “block only” format that always tears down its own pdict overrides afterwards, but if I expose it as a library feature for production usage, it’d probably be in something like an Advance Usaged section, with ample caveats, such as you REALLY should ensure the process in question lives in a receive loop and checks its mailbox for cache-busting events from the library.

christhekeele · May 18, 2023, 5:21am

I think that’d pair well with what I’m working on, but uncertain if I’d adopt it within the library, though.

Vapor itself has a notion of individually required values, and a value-mapping feature that, if you use it assertively with functions that raise, lets you get pretty far at weeding out bad input from the system.

The other extreme would be supporting coercion into Ecto-esque schemas for configuration values (embedded schemas too! remember, config values are not necessarily single terms, they can be data structures if loading from ex. json or yaml files), or even something constraint-solver-esque like what you are describing, that models requirements between values.

The sweet spot for me, personally, would be a thin layer on top of what Vapor is doing to let typechecking warn for invalid config-value-lookup usage. Ex, static analysis tools should know if Configuration.lookup(:DATABASE_POOL_SIZE) returns a string or integer, and warn if used in a function known to require something different, since this is one of the large pains of env var/dotenv configuration loaders in collaborative projects: they are “stringly-typed”, and a developer looking up a value may not be certain what coercions have been applied, especially if the value could also come from other filetypes with stronger typing notions.

This is definitely something I want to think about a little later, but definitely something I want to think about more!

christhekeele · May 18, 2023, 5:53am

Incidentally, while I understand and respect this sentiment, I would miss your presence and commentary on these forums!