Best way to persist events in kafka

Hey everyone!

I’m wondering if I could get some insight on how people are doing the following.
The flow is simple, and I’m sure there are a lot of people already doing it, but I need to expose an endpoint, let’s say POST events (authentication/authorization will also happen in here) where other services can send events :slight_smile: , those events should be persisted into kafka, and then consumed by elixir/broadway to do some data manipulation and other stuff.

The part that I’m curious about is how to send those events to kafka. I was hoping to use Broadway to manage it too, and make use of all the features it provides, but to do this I would have to implement some kind of intermedia storage, to put those http events, and being able to read them using a custom producer (I couldn’t find any other way).

I’m trying to simplify the complexity and underlying technology between the http event and kafka, but without losing control and “supervision” of those events.

Appreciate you all for any kind of input you could provide!

Thanks a lot!

1 Like

That’s a really good question but it highly depends on a few factors:

  • Do you want data types to be strict for each field?
  • Do you want optional fields or is every field mandatory?
  • Do you want zero-copy? This means that there is no serialize / unserialize stages: when you get a byte array buffer it can directly be iterated over and read without having to convert it to an in-memory structure.
  • Do you want future extensibility without old clients breaking?
  • Do you mind having a schema known beforehand?

I’ve been studying data formats lately and to be fair, I can only recommend Google’s FlatBuffers, Ethereum’s RLP (only a byte array encoding though, no concept of types) or maybe MessagePack (although that has the ser/deser stage).

Additionally, a very simplified ASN.1 format that gets ser/deser in BER or DER formats can be very viable and easy to do – but honestly, don’t go down the ASN.1 rabbit hole. You’ve been warned.

Looking through Comparison of data-serialization formats in Wikipedia only shows me one more good candidate – Binn – but haven’t evaluated it yet. Check out that page, it’s not a bad comparison (probably lacking good other formats I’d presume).

Seeing your post’s loosely defined requirements I’d not go for FlatBuffers even though I think it’s a good fit for many scenarios. For your case JSON/BSON might be the least painful option.

Or simply Erlang’s ETF format? (Check out :erlang.binary_to_term and :erlang:term_to_binary)