Ex_data_sketch - production-grade streaming data sketching algorithms for Elixir

ExDataSketch provides probabilistic data structures for approximate counting, frequency estimation, quantile computation, heavy-hitter detection, membership testing with deletion, and set reconciliation on streaming data. All sketch state is stored as Elixir-owned binaries, enabling straightforward serialization, distribution, and persistence.

ex_data_sketch v0.8.0 is out.

What’s new:

  • Deterministic hashing. Every sketch now goes through a validated, byte-stable hash layer. HLL, ULL, Theta, and CMS accept hash_strategy: :murmur3 for Apache DataSketches interop — this was silently ignored in v0.7.x. XXHash3 remains the default and fastest path (~30 M items/sec at p=14 on the Rust NIF).

  • Binary stability & corruption detection. Serialized sketches now carry a CRC32C trailer and an embedded hash metadata block (EXSK v2). Bit-flip corruption that previously would silently produce wrong estimates is now caught and returns a structured DeserializationError. v0.8.0 reads v1 frames; v0.7.x cannot read v2 — stage your rollout accordingly.

  • Murmur3 hot path. 8 new Rust NIFs extend in-Rust hashing to Murmur3. The Murmur3 path is within 8% of XXH3 throughput. No more falling off the fast path when you select :murmur3.

  • Precompiled NIFs for Windows. x86_64 and ARM64 MSVC targets join the matrix. 16 artifacts total (8 targets x 2 NIF versions). No Rust toolchain needed on any supported platform.

  • Property-locked guarantees. 14 StreamData properties lock HLL/ULL monotonicity and error bounds, KLL/REQ rank consistency, CMS overestimation-only, and Bloom/XorFilter/Cuckoo no-false-negative. A 200-mutation fuzz suite verifies that binary v2 corruption never silently propagates.

Breaking changes (2):

  1. EXSK v2 is one-way. v0.7.x readers can’t decode v2 frames. Deploy readers first, then producers.
  2. hash_strategy: :murmur3 is no longer silently overridden to :xxhash3. Sketches that specified Murmur3 will now actually use it — estimates are correct but differ from v0.7.x.

One-liner upgrade:

{:ex_data_sketch, "~> 0.8.0"}

Most users need no code changes. Full migration guide ships in HexDocs.

Hex | Docs

1 Like