Hi all, I’m new to Elixir and not yet super experienced in distributed systems in general and just recently started learning. I’m pretty interested and really like what I’ve seen of Elixir so far!
This might be a stupid question, and apologies in advance if it is the wrong place to ask: I absolutely love the premise of Elixir (reducing a lot of the complexity most of our industry just takes for granted) and have read people online say it basically renders things such as Kafka as unnecessary. I don’t have any experience with Kafka itself, but I have previously worked with Akka, and read enough to get the feeling that in that community, distributed actors and Kafka are seen more as apples and oranges. So I guess I’m wondering how is it different with Elixir? I know Akka is heavily inspired by Erlang, but don’t know Elixir and Erlang well enough to fully know where the similarities end and the differences start.
The very basic stuff I’ve seen so far: it seems Elixir/Erlang makes distributed processes communicating seamlessly from different nodes work more out of the box, while with Akka it does take more configuring (at least IME, back when I worked with it in 2019, some config knowhow and reading the docs to get that working). There’s also GC working differently where in the BEAM it takes place per process, while with Akka, due to it working on top of the JVM, it occurs more globally at the level of the entire node’s JVM (or at least it used to be this way, not sure about newer and/or specialty versions of the JVM).
I could be mistaken, but I get the feeling a reason you’d probably still want to use Kafka even with Akka is, if besides using actors communicating across nodes on a cluster and fault tolerance, you also want to provide stronger guarantees that your message is going to get delivered by taking advantage of Kafka being a persistent log (I can see this being useful for a financial transaction, where the user having to try again could potentially be stressful because they will have to wonder if they’ll get charged twice or something).
So in a nutshell: how would one eliminate the need for this external piece of infrastructure by using Elixir?
Elixir and Erlang don’t eliminate the need for Kafka, what happens is that in some use cases for Kafka in other stacks you could handle without Kafka.
Use cases that are related with pub/sub and data ingestion could be easily done without Kafka just by using the phoenix pubsub or broadway. https://hexdocs.pm/phoenix_pubsub https://hexdocs.pm/broadway
Stuff that relies heavily on Kafka persistent storage isn’t the same thing. Also there are a lot of conectors to Kafka that can make it some solutions possible that you would need to implement manually on a Elixir/Erlang system.
I’d say that you can avoid the complexities of running Kafka and use tools available in Elixir/Erlang to architect your solutions differently. It’s a matter of trade offs, and Kafka is a big compromise given the required infrastructure and the need for specialized knowledge around it. It’s not Kafka, but a good example of approaching problems differently to simplify your architecture you can check on this talk
Hmm, I did read a bit about ETS but it was more in the context of providing an alternative for memcached or redis for the most common use case, which is caching.
Amazing! I had heard of Phoenix before, and was quite impressed by some of the stuff that can be done with LiveView, but I had never heard of broadway. I’ll definitely look more into it.
I’m wondering if you have any examples of such things?
Oh yeah, this is one thing I’ve had in mind a lot lately. I’m currently working in a team where, I hate to say it, but the infrastructure peeps are not the most knowledgeable (and neither am I!), so I’d love to work in such a way that I have to rely on the least amount of components possible and still be confident it can scale without having to spend a fortune (which is the solution they’ve used at previous jobs where everything ran their entire backend on Python and Django… they just threw money at the problem!). Basically I want to make our lives as easy as possible… TYSM for the video, I will definitely watch it as well.
True, my point is that if you want to go beyond GenServer mailboxes – which is the thing that can be a mini-Kafka for many scenarios – then people opt for either ETS for in-memory storage, or DETS / Mnesia / actual DBs for non-volatile storage.
I do agree with @cevado here (and I don’t like Kafka myself) but truth is that our business requirements don’t allow us to forget events or messages so we need something battle-tested.
All of what he showed you, and what you can find yourself after, is true and it exists and is super useful, but you still will need persistence at one point. That’s what I was getting at.
Feature-wise they give you almost anything you might need. Kafka and the like (Redpanda, NATS etc.) give you ultra bandwidth; we’re talking 150k+ messages per second.
just making things clear:
Oban is a background job tool, it uses db to manage jobs, but to keep it performant and in a healthy state the data on jobs needs to be ephemeral, at some point in time you gonna delete the data of jobs already executed(with success or failure).
Kafka on the other hand is an event store, built do deal with massive amount of events.
It’s good to remind that ETS is an in memory key value store and GenServer is an abstraction of a processing worker.
Just given the nature of those tools any multi-services event source/event store solutions would never be possible with only oban, ets and genservers. But I personally would avoid this type of thinking, at least to what comes to engineering solutions, it’s better to start with a problem and think of multiple ways to get to what needs to be done.