Sequin - Elixir + Postgres for a feature-rich message queue

acco · July 19, 2024, 8:15pm

Hey everyone,

Excited to share a new open-source project built with Elixir. Sequin combines Elixir and Postgres into a feature-rich message queue.

We (the maintainers) were searching for the goldilocks message stream/queue. Kafka’s great when you need tremendous scale, but the trade-off is that it has few features and is hard to debug/configure/operate. SQS also has limited features, and we don’t like the dev/test story. RabbitMQ was closest to what we were seeking, but was the most unfamiliar to the team, and looked like it would be a black box.

Databases and messaging systems are at the heart of most applications. But having both adds complexity. We already know and love Postgres – can’t it do both?

Teams pass on using Postgres for streaming/queuing use cases for two primary reasons:

They underestimate Postgres’ ability to scale to their requirements/needs
Building a stream/queue on Postgres is DIY/open-ended

Getting DIY right requires some diligence to build a performant system that doesn’t drop messages (MVCC can work against you). We also wanted a ton of features from other streams/queues that weren’t available off-the-shelf, namely:

message routing with subjects
a CLI with observability
webhook and WAL ingestion (more on that in a bit)

So, we built something that leans into Postgres’ strengths. We love the stream-based model, where messages persist in one or more streams. Each of those streams have their own retention policies (like Kafka). And then you can create consumers that process a subset of messages based on filters exactly-once.

We didn’t like the idea of a Postgres extension. We wanted something we could use with any existing hosted Postgres database. And the Elixir layer gives us a lot of performance benefits. We use Elixir to do stuff that Postgres is bad at, like cache and keep counters.

Plus, who doesn’t want to build in Elixir as much as they can

–

Because it’s all Just Postgres™, observability comes out-of-the-box. But we love a good CLI, so we’ve already built out a lot there to make debugging/observing easy (sequin observe is like htop for your messages).

We think killer use-cases for a Postgres-based stream are (1) processing Postgres WAL data and (2) ingesting webhooks.

For (1), we built on the shoulders of giants (h/t Postgrex, Realtime, Cainophile). It’s really neat to be able to process the WAL as a message queue!

For (2), we’re planning a way to expose HTTP endpoints so you can go from API → Sequin endpoint → Postgres → your app.

–

Under the hood, Sequin uses a very simple and familiar schema for the stream. We tried a lot of fancy stuff, but the simplest route turned out to be the most performant.

Messages flow into the messages table (partitioned by the stream_id). In the same transaction, messages are fanned out to each consumer that’s filtering for that message (consumer_messages).

In terms of performance, this means Sequin can ingest messages about as fast as Postgres can insert them. On the read side, consuming messages involves “claiming” rows in consumer_messages by marking them as delivered, then later deleting those rows on ack.

We benched on a db.m5.xlarge (4-core, 16GB RAM, $260/mo) RDS instance and the system was happy at 5k messages/sec, bursting up to 10k messages/sec. Your laptop is beefier than this, obviously bigger machines can do more.

–

We still have a lot to build. It’s pre-1.0. And we’re curious if the model of “combine a stateless docker container (Elixir) with your existing Postgres db” will resonate with people.

You can see an example of using Sequin with Broadway here:

https://github.com/sequinstream/sequin/tree/main/examples/elixir_broadway

We’re looking forward to feedback and are happy to shape the roadmap according to your real-world needs! Leave a comment or send me a DM if there’s anything you’d like to see

v0idpwn · July 19, 2024, 10:45pm

Cool project!

I think it would be beneficial to allow users to write messages directly through the regular PostgreSQL API instead of relying on an external service. This approach would enable users to leverage transactional guarantees. In my opinion, these guarantees are one of the biggest advantages of using your database over external infrastructure. I reckon it’s also one of the biggest reasons why people use Oban.

I’ve been contributing to, and using GitHub - tembo-io/pgmq: A lightweight message queue. Like AWS SQS and RSMQ but on Postgres., which also provides a Postgres-based message queue, and the whole API is exposed through a Postgres extension. While using it, I feel like indeed the message queue and the database are the same thing. But my first impression from Sequin is that the storage backend could be anything else and it wouldn’t make a difference for an application developer.

acco · July 22, 2024, 4:56pm

I think it would be beneficial to allow users to write messages directly through the regular PostgreSQL API instead of relying on an external service.

Hey @v0idpwn – we thought about this behavior a lot! One of our requirements is that we didn’t want it to be a Postgres extension. And there are a ton of features we were able to add by having a service layer on top of Postgres.

But we do like the idea of transactional guarantees with your message queue. So we’re going to be adding Postgres functions you can call to do just that. We think just a few functions that Sequin can initialize in any Postgres database will do the trick.

acco · July 26, 2024, 8:42pm

Hey all –

Excited to share that we released an Elixir client for Sequin on Hex!

Sequin’s HTTP interface is straightforward enough, but the SDK is a nice wrapper for those that prefer. It’s a good fit if you don’t need a full-blown Broadway producer/pipeline. And you can use the SDK in test to create/delete streams and consumers.

Here’s an example:

# Define your stream and consumer
stream = "your-stream-name"
consumer = "your-consumer-name"

# Send a message
case Sequin.send_message(stream, "test.1", "Hello, Sequin!") do
{:ok, %{published: 1}} ->
  IO.puts("Message sent successfully")

  {:error, error} ->
    IO.puts("Error sending message: #{Exception.message(error)}")
end

# Receive a message
with {:ok, %{message: message, ack_id: ack_id}} <- Sequin.receive_message(stream, consumer),
     :ok <- YourApp.process_message(message),
     :ok <- Sequin.ack_message(stream, consumer, ack_id) do
  IO.puts("Received and acked message: #{inspect(message)}")
else
  {:ok, nil} ->
    IO.puts("No messages available")

  {:error, error} ->
    IO.puts("Error: #{Exception.message(error)}")
end

Let me know if you have any thoughts or questions

josevalim · July 27, 2024, 7:20pm

Hi @acco! This is a really neat project!

I am wondering how you would be able to implement that without an extension. As far as I know, you can’t guarantee in PostgreSQL that a value in a transaction will be inserted in that order in the WAL (without setting transaction level to serializable), so for example, you can’t insert a row with COUNTER=13 and guarantee that will appear in the WAL before COUNTER=14? A transaction may hold things off such that COUNTER=14 appears in the WAL first.

acco · July 29, 2024, 7:47pm

Hey Jose, thanks!

I think you’re referring to the fact that things like e.g. sequences can commit out-of-order, is that right? I actually wrote about this last week!

For a given row, the WAL will return updates in order. But indeed across rows sequence values may come in out-of-order. We think that strict ordering by row is what people want when they want ordering, but if anyone has use cases where they need strict ordering across rows/keys let me know.

The “transactional guarantees” bit in my post was about ensuring that inserts into messages only happen if some other database operation you’re performing also succeeds (both commit together).

josevalim · July 29, 2024, 8:13pm

That was an excellent read, thanks for sharing!

acco · September 27, 2024, 8:18pm

Hey everyone,

Wanted to share all the updates we’ve made to Sequin.

We made a significant overhaul to Sequin in v0.4. Before, you’d push messages to Sequin streams via HTTP. Streams were persisted in Sequin-managed Postgres tables.

As of v0.4, Sequin instead streams your existing Postgres tables.

You can use Sequin to add streaming capabilities to existing data, like you might with a Debezium + Kafka pipeline. Or you can create new tables that you’ll use specifically to power streaming use cases as a pure Kafka alternative.

After connecting a Postgres table to Sequin, you can setup consumers for that table. A consumer can start at any “position” in the table: the beginning, at a specific row, or at the end. If it’s a push consumer, Sequin will push changes to your application via HTTP POST. If it’s a pull consumer, Sequin will provision a new HTTP endpoint that your application’s workers will use to pull messages.

You can have 1 or 1000 workers subscribe to a consumer. And you get guarantees like exactly-once processing of all rows and changes.

So, a table in Sequin is like a topic in Kafka. A consumer is like a Kafka consumer group. And instead of offsets, consumers traverse tables using a sorted column on your table.

Like Kafka, Sequin also delivers changes FIFO. So if a Postgres row changes multiple times in quick succession, each change will only be delivered after the prior change was acknowledged, avoiding race conditions.

So you can think of Sequin as an Elixir app that sits on top of your Postgres database and does 3 things:

Safely paginates and detects changes to your Postgres tables.
Manages consumer state (FIFO, fan-out, delivery, etc).
Offers a control and observability plane on top of all this.

Our bet is that for many (most?) teams and use cases, a Postgres table is a better storage layer than Kafka. You’re already operating Postgres. And data in Postgres is far easier to manage: you can migrate tables to change your data model and easily read it at rest with SQL. Your data is typically already in Postgres – why have a longterm copy in another system just to stream it?

Postgres also allows us to more easily build a ton of useful features, like observability tooling, replays, SQL-based filters on consumers, etc.

While you write directly to your Postgres tables with insert/update/delete, the consumer interface is via HTTP (push or pull). There are a lot of moving parts with message delivery, and we lean on Elixir/OTP to make this smooth.

You can follow along on GitHub, we open source everything: GitHub - sequinstream/sequin: An open source message stream built on Postgres.

We have an example with an Elixir+Broadway consumer, which is a great experience:

We also just launched our hosted option, which is a good way to kick the tires.

Finally, I recorded a video yesterday that shows how all this comes together:

Much more to come – let me know if you have any thoughts or questions! We’re adding features all the time, so if we’re missing something, we’d love to hear it