Hi all, I’m quite pleased to be open-sourcing a project that @cabol and I have been working on, PartitionedBuffer. It’s a performant and flexible write buffer that we’ve found useful in a number of scenarios.
Currently, we’re shipping it with two buffer implementations, PartitionedBuffer.Queue and PartitionedBuffer.Map. Queue is useful for buffering any write that comes in (think: I need to buffer writes to, e.g. Clickhouse, because Clickhouse wants large blocks of inserts, not individual writes). Map is useful for scenarios where you may have data that changes very frequently but you want to debounce writes to a downstream service (think: I need to update some value every unit time, last write wins).
I’m happy to answer any questions in this thread. Enjoy, I hope it makes your life a little easier!
Yes. So, imagine it as a replacement for Task.Supervisor.async_nolink for fire-and-forget tasks. If you’re creating many fire-and-forget tasks, you will most definitely notice the overhead of creating and GC-ing the processes at a certain level of load, and if you get a flood of traffic, you’d spike the number of processes which could lead to OOMkills, etc. PartitionedBuffer solves that problem.
PartitionedBuffer isn’t designed for backpressure per-se, so if you have a downstream service that’s getting overloaded, it won’t save you from that. We do rebuffer if we have transient failures but we are lucky that our service has pretty predictable scaling and we have a lot of monitoring in place. Obviously if you’ve got a catastrophic failure, this won’t save you, you’ll have to deal with it yourself
We have looked into async writes for Clickhouse, but there are several gotchas there, and we find it easier to solve problems on our end. We have excellent primatives in the BEAM to deal with it, so that’s what we do.
We also use the Map buffer quite a bit as well to debounce hot keys and to handle tasks that need to run e.g. once per-user on some schedule, but users to process come in irregularly. We had a system that would run a GenServer per-user but we had issues where a large influx of users we hadn’t seen before could crash the service because so many processes got spawned. We were regularly running 10-20 servers to handle this workload, now we can happily work 4 servers at ~100% CPU and they’re completely stable.
Yes! We actually don’t use this library for writing to Clickhouse in prod (we use it for buffering writes to Kafka, then we stream from Kafka into Clickhouse – some ETL needed), it was just a convenient example
I’ll resist the urge to dive into a whole Clickhouse digression because we have been doing a lot with it lately.
As far as the buffer is concerned are you processing data where data loss in the case of a crash is acceptable or is there an ACK mechanism where a caller can get confirmation its data was written out and flush the value on its end?
We don’t have a mechanism built-in where, e.g. a caller process could register to be notified when a flush occurs, because we usually have our flush interval on the order of several seconds (depending on the process), but you could build something like that yourself. The flush callback is just a function that receives a batch of messages (chunked) and you could store the pids of callers somewhere and send them a message when you’ve flushed. We just don’t? Our system takes data in over websocket and we’re pretty latency-sensitive on responses, since we’re returning content to be displayed inside and on top of our customers’ sites.
We monitor our systems pretty tightly and we have a lot of headroom RAM-wise on our instances (we run r7g.8xl in prod for our biggest workload). The only instance we’ve seen the BEAM crash are OOM conditions and we simply don’t let that happen. We have proactive systems in place that restarts the BEAM process if some precondition is met (ram usage, run queue, etc.) to protect against this, and we trap exits to make sure we finish processing work when the process terminates. We have verified this works in prod (and we deploy several times daily sometimes, so it gets exercised frequently).
That said, what we’re putting in there isn’t like, payments data, it’s behavioral analytics data (think clickstream-type data) but we do trust that this system works