PartitionedBuffer - Performant write buffering based on ETS double-buffering

Hi all, I’m quite pleased to be open-sourcing a project that @cabol and I have been working on, PartitionedBuffer. It’s a performant and flexible write buffer that we’ve found useful in a number of scenarios.

Currently, we’re shipping it with two buffer implementations, PartitionedBuffer.Queue and PartitionedBuffer.Map. Queue is useful for buffering any write that comes in (think: I need to buffer writes to, e.g. Clickhouse, because Clickhouse wants large blocks of inserts, not individual writes). Map is useful for scenarios where you may have data that changes very frequently but you want to debounce writes to a downstream service (think: I need to update some value every unit time, last write wins).

I’m happy to answer any questions in this thread. Enjoy, I hope it makes your life a little easier!

Hex: PartitionedBuffer — PartitionedBuffer v0.3.0
Github:

6 Likes

Is this fire and forget from the standpoint of the calling process? Or is there to wait until it ACKs?

Tangentially with respect to Clickhouse have you looked at server side buffering with its native async inserts?

2 Likes

Yes. So, imagine it as a replacement for Task.Supervisor.async_nolink for fire-and-forget tasks. If you’re creating many fire-and-forget tasks, you will most definitely notice the overhead of creating and GC-ing the processes at a certain level of load, and if you get a flood of traffic, you’d spike the number of processes which could lead to OOMkills, etc. PartitionedBuffer solves that problem.

PartitionedBuffer isn’t designed for backpressure per-se, so if you have a downstream service that’s getting overloaded, it won’t save you from that. We do rebuffer if we have transient failures but we are lucky that our service has pretty predictable scaling and we have a lot of monitoring in place. Obviously if you’ve got a catastrophic failure, this won’t save you, you’ll have to deal with it yourself :wink:

We have looked into async writes for Clickhouse, but there are several gotchas there, and we find it easier to solve problems on our end. We have excellent primatives in the BEAM to deal with it, so that’s what we do.

We also use the Map buffer quite a bit as well to debounce hot keys and to handle tasks that need to run e.g. once per-user on some schedule, but users to process come in irregularly. We had a system that would run a GenServer per-user but we had issues where a large influx of users we hadn’t seen before could crash the service because so many processes got spawned. We were regularly running 10-20 servers to handle this workload, now we can happily work 4 servers at ~100% CPU and they’re completely stable.

1 Like