Cainophile - Change Data Capture for Postgres using Elixir

I recently announced my work on Cainophile, and it was suggested I post it here for further dissemination and feedback. Cainophile is a library to assist you in building change data capture systems with Postgres and Elixir. I also have released a separate library for decoding the logical Postgres replication log directly in case that is helpful. You can find full details on the announcement.

Announcment
GitHub

10 Likes

Looks really useful and is something I’ve been wanting to do with Postgres in Elixir for a while.

Does using :temporary as the replication slot mean that only new changes will be received by the subscriber and it won’t receive any historical changes when restarted? For a use case I have in mind this is the behaviour I want whereby the subscriber only receives the changes made while its running.

Glad to hear that it might be useful! Honestly I have not attempted a full sync with it, I’m not sure how that would work but presumably if you kept all of you WAL logs around and postgres has access to them, you could request the initial WAL position and go from there. As far as temporary, it really just refers to if you want a persistent slot or not. Persistent slots keep track of what has been acknowledged by the subscriber, and survive across connections. For production use cases, persistent slots are probably the best option, as you can be sure that you’re not missing out on changes due to network issues/subscriber failure. However, if you have a replication slot hanging around and not being acknowledged, I believe it will eventually fill up and cause problems.

Temporary slots just create an ephemeral replication slot that goes away when the subscriber disconnects. This is useful for testing or quick trials as it doesn’t keep garbage around. If you want to try this on a production database for example (after staging!), you could start with temporary to minimize the impact.

As far as what you’re trying to do, it sounds like temporary or named slots would work. What you’ll want to do is specify {0,0} for the wal position (it defaults to this), which means “let postgres decide what to send me”. In the case of temporary slots, you will just receive what is new with {0,0}, as there will be no previously acknowledged wal position that Postgres will try to send. With a named slot, it seems that postgres will start sending you info from where you left off.

Thanks for the reply. Useful to know that {0, 0} for temporary means to receive only new data.

Hey @bbhoss, I’m not sure if you’re still working on this, but I’ve been looking into this library with interest lately. Two things stand out to me:

  1. I’d like to find the right way to allow my application code to decide when to confirm a message. Right now, when a transaction commits, Cainophile sends the message off to any subscribers, and then confirms. If those subscribers however crash or are unable to do anything useful with that value, it is lost. User space confirmation would be key to avoid data loss.

  2. I see you had to implement postgres binary value decoding. I’m surprised there wasn’t something in epgsql or postgrex that you could reuse there. In particular I’m concerned about extensibility over time with custom types. Is there something about the values sent by Postgres that are a different binary format from what epgsql and postgrex deal with today?

This is something I’m wondering about as well. It would be nice to be able to decode directly to erlang/elixir types (especially integers, booleans).

I slapped together a library that borrows the Supbase realtime / Cainophile code. The main change I made was add a process to accept the changes so that you can act on the data right in Elixir (and send off wherever you like).

One of the todos is to address the type decoding if that’s even possible (and it would be nice to have an adapter approach so that you can specify the type you want). Another is to swap out epgsql now that Postgrex has native replication.