Some general talks\questions about choosing between internal and external message stream

heathen · July 1, 2017, 3:01pm

Hi!

I’m working on my own little project which involves a number of a kind of IoT devices. These are actually devices with limited autonomous ability and without any kind of control user interaction, so they are getting control commands from the back end services and sending back results and logs\events.

I’m going to try to use as loose coupling between modules as it’s possible from scratch. As well I would like to go with a kind of CQRS/ES - the project is digital and I don’t expect any needs in batch data processing as all data will arrive in [near] real time.

So let’s go with the ES part for now. Let ES be “event streaming” and “event storage” at the same time.

In the beginning of the project there will be Elixir apps\parts\modules only on the back end side. I think about Channels as the main way to interact with any kind of external clients (devices and user’s control panel clients as well). But in the future it may be necessary to include modules (services) in other languages as well (for example, for heavy computation, media conversion tasks and so on) and let external systems like big data analysis application and so on to consume real-time events.

The first thing that comes to mind for the internal interaction between different back end services\modules is to use something like Phoenix PubSub. In this case it will be necessary to implement at least robust, fault tolerant and easily expandable “storage” part as well as interfaces to external modules\systems when they arrive.

I’m not a big fan of over-complexity, so don’t like to use parts which can be avoided. But from the other hand I’m a really lazy guy, and don’t think its a good idea to re-implement things are already exist. So another thought is to use Apache Kafka as the event streaming and storage platform and use Kafka interface in every independent back end module\application (producer and subscriber\consumer). This approach will give a fault tolerant event streams storage as well as central fast and robust events pipeline.

Of course, it’s possible to start with one option and evolve to another, but as I said I’m lazy, and aside from re-implementing the same thing later I expect migration problems.

Surely I do not expect any direct answers like “you must do this or that”, especially when they come without any explanation. I would like to discuss options: I’m pretty sure I overlook many important things and would be grateful to point on them.

Thanks!