Advice how to go about library configuration from Elixir 1.9 and on?

dimitarvp · July 3, 2019, 12:40pm

Hey everybody,
In light of the recent Elixir 1.9 announcement, I’d like the first library I write to not use Mix.Config. (It is an sqlite3 library with an Ecto 3 adapter.) To that end, I’ve come to several alternatives while thinking and sketching code, and I am curious what anybody has to say as a preference or recommendation.

Process dictionary. I really like this but it comes with the arrangement that the user of the library will either (a) take special care to only use designated processes to interact with my library, or (b) will use it from anywhere they want but have to clean up the process dictionary keys after they are done with a task. As much as I like the process dictionary, it does really sound like going back to imperative languages where there’s an implied context / state to watch out for.
Have a GenServer that centralises the access to the limited resource my library is managing (connections / open handles to a sqlite3 database [file]), and include helper functions that merge config options to the GenServer's state (a map). Clean, beautiful, makes perfect use of the OTP.
Make the handle object used to initiate any operation in the library also encompass configuration. This I also like quite a bit since it’s a pure FP approach and doesn’t rely on implied context / state.

Other ideas are welcome as well.

I’m leaning towards starting with (3) because that will give the users of the library a GenServer-less use case. And then adding (2) because then the users of the library can also just name their sqlite connection and the OTP will take care of dispatching their calls to the appropriate GenServer.

Again, I like (1) a lot but it doesn’t feel right to me.

Opinions? Ideas? Criticisms?

jola · July 3, 2019, 12:52pm

There’s this approach, making the user explicitly pass init options.

I don’t understand what you mean about the 1.9 release though. It didn’t remove configuration, it just replaced Mix.Config with Config, and stopped generating a config folder for new projects (because libraries rarely need their own configuration). Those libraries still want to read configuration from the user though, so eg the phoenix generator will keep creating a config folder.

For compile time configuration, Config is still your best bet.

dimitarvp · July 3, 2019, 12:55pm

No, I don’t want compile-time configuration only – sorry, I was unclear on that point. I’d like the users of the library to be able to modify, say, sqlite command timeouts dynamically, at runtime.

I could use Config of course but IMO it seems clearer to me to just have hardcoded defaults that the user can override at the init step. I am open to suggestions, haven’t written an Elixir library before and can use all feedback and advice.

dimitarvp · July 3, 2019, 12:56pm

I mean this: Library guidelines — Elixir v1.16.0 (from the Elixir 1.9 announcement thread).

jola · July 3, 2019, 12:58pm

Ah, cool, I hadn’t seen that, thanks!

Then the link I posted should be relevant, since it is pretty much what the guidelines refer to here

In case you need to configure a process, the options should be passed when starting that process.

Having an init step like phoenix/ecto have also makes sense.

dimitarvp · July 3, 2019, 1:28pm

So given the 3 options above, what would you go for?

jola · July 3, 2019, 1:33pm

2 or 3 depending on what type of configuration we’re talking about. It’s pretty awkward to carry all options around to pass to every command, but some options only make sense when calling commands. For example, Redix has a timeout when calling redis commands, that option is not global, you pass it to Redix.command. But connection timeout is not per command, so it is passed when you start the process (also not strictly global, to be fair, it’s just per process).

Realistically you might have some of both.

LostKobrakai · July 3, 2019, 1:59pm

I don’t think your options 1/2 really hit the nail for what the guidelines and removal of /config are meant to push people to.

If you have a library and no (configurable) processes are being started => accept config as parameters to functions. This could be supplying fresh config to each call, but also a config = MyApp.Config.validate(input) token to be passed around.

If you have processes being started, which need user configuration, let the user start those processes in their applications supervision tree and accept config as start arguments. Let your library functions receive a pid/registered name as argument to know which processes to call into (like e.g. GenServer.call). This isn’t really limited to a single process as well. What the user interacts with could just be a top level supervisor with many children.

If you have done the above things you can look into making configuration more convenient and less manual by additionally harnessing e.g. the application environment, process dictionary or having compile time config provided via macros (like e.g. Ecto.Repo does it). You could compile simpler API functions into some modules (MyModule.func(…) instead of Lib.func(MyModule, …)). This allows user to use those “limiting” configurations options if it fit’s their usage, but they can still go the more manual way and configure stuff via parameters/start arguments for more complex usecases.

dimitarvp · July 3, 2019, 2:42pm

Yep, that’s my option 3. The more I think of it the more it seems that I have to start from there because technically new processes or any OTP sauce isn’t necessary for using the library. In the end it’s just a NIF handle (bound to a Reference) that you will pass around.

Agreed, but that doesn’t cover the runtime configuration. So I plan to have something like Xqlite.Server.put_exec_timeout(pid, millis) that does a GenServer cast/call to a wrapper which keeps all configuration in its state.

And since the initial configuration can be supplied via OTP means then I think the Elixir’s configuration stack is superfluous in this case since both boot-time and runtime configuration is now covered.

If we worked together and I made that argument, would you agree with it?

This is what I am leaning towards as the final step of the configuration task for the library. Have (a) data structure capturing everything necessary to work with a single handle in the library – including handle-specific configuration like statement timeouts or batch sizes for records – and (b) have a GenServer wrapper for people who want to use the library with OTP, and then proceed to (c ) make convenience helpers to save some keystrokes or unite calls in one function instead of two or more.

LostKobrakai · July 3, 2019, 3:32pm

Arguments passed to a process at startup is already runtime config. It’s done when starting the process and not at compile time. put_exec_timeout is rather “runtime updatable config”, which is imho another step up. Unless you have critical state to keep around I’d try to stay with a simple restart when changes need to be applied.

I did leave this out in my above text, but I’m not really a fan of using a process, which does only store config. If I want to keep config around I’d much rather put it in the application env and have it optimized for read concurrency than putting it in a process to have access to it serialized. My advice for configuring processes on startup is much rather meant to apply if you need processes to do the stuff your library provides.

tristan · July 3, 2019, 3:49pm

For runtime configuration of your library you simply document the configuration options that the user will put in their releases.exs.

I can’t say I’ve seen this much in Elixir so I may be off base here but in Erlang we put defaults in the .app/.app.src file under the env key. You can do the same with the application configuration in your mix.exs.

But this is all assuming a use case that makes sense to have configuration read in from the environment. It depends on the application whether that makes sense or if it is a case better to have the user passing in a a set of configuration values when calling a function in your application.

Also your (2) and (3) aren’t mutually exclusive. If you have an application that must start servers you likely want to give the user a way to start them in their own supervision tree and have them pass in configuration to the start_link of whatever they are starting from your application.

(1) should never be used :). Not that the pdict should never be used, just almost never, but for configuration I think it is safe to say just “never”.

wojtekmach · July 3, 2019, 5:01pm

Definitely pass the data around. If you use global configuration (e.g. application env) it means
you have less flexibility, you only have one configuration and so you can’t configure this
connection with X and that connection with Y.

As others have mentioned you likely have some startlink/init/connect/new/whatever function, so accept configuration there and pass it around to further calls. And if passing options is annoying to the users of the library they can (and often should) wrap the library with their own module.

dimitarvp · July 3, 2019, 5:19pm

Well, since this handle is a Reference to a NIF resource then I’m weary of restarting. I prefer having a runtime-updatable configuration as you called it and it just takes effect for all work done under the NIF handle after the config is modified. As per my two above examples: modifying the execution timeout or batch size count are quite okay options to modify as you go. They are not some mythical gospels so big and powerful that the GenServer must absolutely be restarted to cope with their change. If I am missing something here, do let me know though!

The library definitely does not need GenServers per se. It’s just that I view them as a very sensible default way to organize your work in the case of e.g. your application uses 15 sqlite databases. And each GenServer will hold (a) configuration and (b) the NIF resource Reference. In other words, the entire vanilla data structure that you otherwise have to pass around.

If we were working together on such a PR, would you press your point that optional OTP primitives in these cases shouldn’t be a part of a library? I am curious. I view automatic organisation of sqlite database handles via GenServers as a pretty handy mechanism, it’s really minimal and is strictly opt-in. (I agree that just carrying a structure around is the best default vanilla approach and that’s what I am going to do first. The OTP stuff is going to be a small, hopefully useful, extra.)

dimitarvp · July 3, 2019, 5:25pm

Interesting, haven’t looked at how Erlang does it, thanks! Learned something.

As for the application configuration, that’s what I am trying to establish here – can I do away with it altogether? I view all options related to an opened sqlite database as specific for the instance, not global. And they’ll have sensible defaults of course.

I am quite inexperienced in that area. Why is process dictionary almost never applicable in your eyes? As for configuration, yeah, I got to the same conclusion – looks like surprise mechanics nobody wants: you are calling a library function and boom, you now have state in the process dictionary. Definitely wouldn’t like to have that done to me.

Yep, I am going to do just that – make my own GenServer that is able to get initialised with a database and its options plus is pluggable under any Supervisor.

dimitarvp · July 3, 2019, 5:27pm

Absolutely. After reading the rationale linked in the Elixir 1.9 announcement, I found myself nodding at every phrase. And you summarised it quite well, too.

Well, a standard pluggable GenServer is in my eyes good enough for convenience in organising several databases under a Supervisor, be it static or dynamic. Indeed, if that’s not good enough for somebody, they have all the tools in their hands to roll their own solution.

LostKobrakai · July 3, 2019, 5:43pm

I’d say this can be reason enough to not want to restart a process.

LostKobrakai · July 3, 2019, 5:46pm

I’d compare it to ETS. There you also configure it once and get a reference back, which you then pass around to use. But also ETS is never reconfigured and doesn’t need configuration once initialized. I’d certainly try to explore the non-process version.

wojtekmach · July 3, 2019, 5:57pm

Totally agree, if you don’t need a process, don’t create one! As you all may know, Mint is following a process-less architecture and that’s definitely a great example to look at (and blog posts, talks etc associated with it). Shameless plug: In MyXQL even though the public API is so to speak stateful (by way of using db_connection), under the hood there’s a separate module that the db_connection callback module calls into - and that one is process-less and thus much more easy to understand, test, debug, and generally more usable in different contexts.

dimitarvp · July 3, 2019, 6:02pm

So you guys wouldn’t include any OTP wiring in a library and would point the users at :ets or Registry if they want to manage many of those native handles?

tristan · July 3, 2019, 6:07pm

Yea, you likely can. I tend to provide both options. You are working with sqlite in this case so maybe similar to my pgo library, GitHub - erleans/pgo: Erlang Postgres client and connection pool – configuration is specific to a pool, so they can be separated in the config file that way, allowing the option of starting them on boot or passing a config pgo_pool:start_link in a supervisor.

I do the same with grpcbox server and client GitHub - tsloughter/grpcbox: Erlang grpc on chatterbox

But I have yet to decide if this is good design in Erlang or Elixir, so interested in reading this thread. It feels like I shouldn’t provide different way to start services and should force the user to always start in their own supervisor…

As for the pdict, one reason to not use it is that it is confusing and harder to read code when state is hidden there. It is much easier to understand what is going on (and to test the code) when state is explicitly passed in and returned from functions.