Preparing for Production

Regarding the included_applications bit:

Using it means you are responsible for managing the full lifecycle for the included application - additionally, only one application in the tree can include a given application (i.e. app a and b can’t both have c in their included_applications). This should be self-evident (how can two applications manage the startup of some other application), but it definitely can cause confusion. I haven’t tested what happens if a depends on b, and contains c in it’s included_applications, while b also depends on c but has it in it’s applications list. Presumably nothing good happens there, and you’d need to take over both b and c via included_applications, but as I mentioned, I don’t know for sure.

More often than not, all you really want is to set the start type of a given application to :load, start it post-configuration phase, and that’s that. You can do this pretty easily with Distillery, but I don’t think it’s possible via mix.exs currently AFAIK (or perhaps that’s what extra_applications is for in 1.4+, I’d need to double check). Again though, if you have a dependency which depends on that app being started, you would likely need to set it’s start type to :load as well, but I haven’t tested this (haven’t actually encountered this problem yet).

In any case, it’s not that there is something wrong with using included_applications, it’s just that the other approach is better and less error-prone.

1 Like

included applications are fine. But they should only be used if you are doing a “release”. Never use them in a library or in any code that should be shared as an included application can only be included once.

I don’t think it is bad practice at all to start the connection in a different application. In fact this is the most common approach to separate concerns. Usually you want different applications. That is what they are there for. It also depends on your failure modes. Some libraries provides both options, to either run as a standalone application or for a gen_server to be included in your supervision tree. The latter is most common for small applications but at least in the erlang world running as a separate application is probably more common.

Ah, that’s a nice idea, but I can’t seem to find in docs how exactly to set the app start type?

It depends on the app purpose. If I’m splitting a larger system into multiple top-level OTP apps, then sure. But here, we’re discussing a client library. I don’t think that such library should try to be smart and host my connection processes. It’s my system, so I want to decide when to open the connection, and where to host it in my supervision tree.

But the biggest sin of KafkaEx is IMO the fact that it seems to open this connection during its startup (or at least it requires connection parameters during its startup), and to do that, it requires a global config (app env). That means we can only connect to one server, which seems quite limiting. Even if the library offers an opt-out through some app env, that approach is still clumsy and global. I can still see potential for conflicts, for example if one client app wants to use the default startup connection while another one wants to disable it.

Also, that approach complicates runtime configuration, as we’ve seen in this thread. We’re now discussing included applications or setting start type to :load simply because KafkaEx is overly eager about fetching its options and starting as soon as possible.

There’s a lot of needless clumsiness here IMO. Just give me a start_link function which takes its options as a combo of required and optional arguments. I can use standard OTP mechanisms to organize my code and runtime in arbitrary ways:

  • I can start the connection when/where I want to.
  • I can easily fetch secrets from wherever.
  • I can still use app env to vary dev/test/prod parameters.
  • I can easily separate concerns myself through umbrella apps.

There’s a lot of flexibility here, and the decisions are left to me, the end-user of the library.

Another nice benefit is that such interface can clearly specify which parameters are mandatory (e.g. host address), and which are optional (e.g. port). And with using typespec it can also specify type requirements for each parameters.

You have good points and it is usually about trade-offs. If you have a small application with few dependencies then it is easy to include other peoples code in your application’s supervision tree. Once your application grows you don’t really want to pollute your own supervision tree with applications that should run stand-alone. You will end up re-inventing the Erlang VM’s boot and configuration system and it will get harder and harder to reason on how things are supposed to fail.

I think some of these problems (like your configuration example) is that mix overlooked erlang releases during its design and therefore push people to go against best practice instead of using what was already there. This has the consequence that it harder to do things that has been considered best practice for a long time.

I see erlang VM more like an operating system. Just like linux you configure and start applications separately. You don’t include HAProxy or nginx in your application and configure it from there. You use their configuration files (sys.config) and configure them and then let your init system start them. You don’t include MySQL in your app. You configure it and connect to it from your application.

There’s a lot of flexibility here, and the decisions are left to me, the end-user of the library.

Yes, but it is not a library. It is an application. Perhaps for something simple as a client having a connection to a remote server you might argue that it shouldn’t be an application but rather a library and it should be distributed as a library rather than an OTP application. I.e it doesn’t come with its own supervison tree.

That means we can only connect to one server, which > seems quite limiting.

Yes, that is limiting. Lots of applications cater for this though so this is more a short-coming of KafkaEx than it running as a separate app.

In the end both ways are vaild. Sometimes you want to connect to a standalone database. Sometimes you want to use sqlite3 as a library and include it in your application. I prefer to have applications run stand-alone because they know how to deal with failures. I depend on that application and my application gets to decide what to do if the dependent applications fail.

I have a similar view myself. But I see the top-level OTP app as the one which defines the whole system.

For example, on OS level, I might start multiple instances of the database. The way I’ll do it is I’ll start multiple instances by running the command, and providing some arguments (and possibly use a distinct configuration file). With systemd I might have separate unit files and describe relationship between distinct server instances and other services in the system. Therefore, I don’t include services in my app, sure. But I do start these services from my system in the way, order, and the multiplicity I want to start. This allows me to e.g. stop some part of my system completely (supporting services and that particular db instance) without disturbing another part of that system.

The equivalent of this in OTP world for me is the supervision tree. The top-level supervisor of my app is the top-level supervisor of my system. If I have some “subservices” foo, and bar which require the kafka connection to baz, then I’d like to host them under the same supervision subtree. Stopping the top-level supervisor of that subtree stops a well defined part of my system, including the associated kafka connection.

In contrast, if the kafka connection sits under the kafka app, then the situation is weird. Some parts of my subservice sit in one place (my supervision tree), others in another place (kafka supervision tree). If I want to stop the whole subservice, I need to manually terminate both parts. That’s error prone, and it’s not going to work properly with let-it-crash. If my subservice stops, say because too much crashes, the connection lingers on. If I stop my entire OTP app, the connection will still linger.

Another example based on cowboy. The way we use it with e.g. Plug and Phoenix, is that we insert cowboy subtree into our own supervision tree. That means I have a nice control of starting and stopping different servers separately. I could host the main server and the admin server in my app, and they could sit in separate parts of the supervision tree, and I have a fine-grained control over their lifecycles. I like that control.

Not sure what you mean by this?

In any case, sys.config is AFAIK not runtime friendly. The original problem is: we need to fetch some data (e.g. secrets) from various files on the disk (not sys.config). In the cases I had myself, I need to fetch this from etcd as well as disk file. Fetching these secrets is IMO a part of the system startup procedure, so I’d like to implement this in the code of my system. In order for this to work, supporting services (such as kafka client) shouldn’t decide to eagerly connect on their own using parameters from some hardcoded place. Instead, they should provide me a way to start them and provide parameters as arguments, just like e.g. various database servers (or other OS services) do.

1 Like

I didn’t mean in the strict OTP sense, but in a more conceptual sense. It’s a library which we use to talk to kafka. That’s an equivalent of e.g. http client, or database driver. These are certainly not standalone apps IMO. Whether or not these things need some singleton process is less relevant to me. What does matter to me is that I want to host process (or subtree) for each connection in my own subtree. This will allow me to properly release resources (e.g. open connections) when some parts of my system terminate.

Hi, I’m one of the maintainers of KafkaEx. There’s a lot of good feedback here. I agree that the way worker connections are supervised by default could be better - it’s a piece of the design I inherited with the project. If I understand correctly, that decision was originally made to make it “easy” to start an application and connect to Kafka. I use KafkaEx in production and have had to deal with these complications myself as well.

Unfortunately, we are now in a situation where there are lots of ideas how to make the API better but there are people already using it, so we need to be careful not to break it for them. I’ve considered pushing for an API re-design with a major version change (we are still < 1.0), but we also have several key Kafka features that still need to be implemented that are more urgently needed. We welcome any help we can get!

We do provide the flexibility to disable the default worker and to start workers under your own supervision tree. As was mentioned, however, this is a bit cumbersome and confusion-prone. Anyone who has trouble with KafkaEx is welcome to hop on to the Elixir-lang slack and ask in the #kafkaex channel - we try very hard to be available.

4 Likes

Hey, thanks for chiming in!

I hope I wasn’t too critical of KafkaEx, but if I was, I’m sorry - it wasn’t my intention. While I don’t use it myself, I’m sure it’s a great and useful library. It just had a bad luck of appearing as an example of what I personally consider is a limiting design decision (eager connection during OTP app startup + depending on app env). Not sure if it’s any consolation, but it’s an approach I’ve seen fairly frequently, and not only in Elixir libs. So I guess, what I’m trying to say is, while I don’t think that such approach is good (at least not in most cases), I’m not criticising the entire library, just one small part of it.

That being said, the approach currently advertised in the project’s readme doesn’t work with specific runtime needs, such as fetching secrets from arbitrary sources (which is IMO the core topic we’re discussing in this thread). I think that maybe a small thing that could help a lot here is to explain how to disable the default worker in readme. A section titled “custom runtime configuration”, or such, with a sketch of the solution, could help a lot.

Going further, there are possibly ways of supporting and promoting a cleaner approach of connecting without breaking the existing clients. For example, changing the startup code to avoid doing anything eagerly if server connection is not configured, and changing the readme to promote that kind of practice. I’m not familiar with the code to say how much effort does it take (or does this even make sense), and I totally respect that you have other priorities in terms of features. A readme explanation of :disable_default_worker would be nice though :slight_smile:

Thank you for your work on this library, and for contributing to the Elixir ecosystem!

1 Like

What I have here, https://github.com/GT8Online/weave, is now working for me in my production system.

It’s still very early, but thanks for all your help.