HLL - Redis compatible HyperLogLog implementation in Elixir

HyperLogLog is one of my favorite probabilistic algorithms. It’s a super space-efficient way for cardinality estimation. I recently used it at work via Presto and Spark and it works surprisingly well. Then I decided to make it from scratch to have better understanding how it actually works and chose lovely Elixir to implement it.

The rationale to make it Redis compatible is mostly because it’s an easy way to ensure that I have the correct implementation by checking that given same inputs, Redis and this module would yield the same output.

The Redis-compatible module used the same hash function (port it to Elixir), same estimation algorithm and same binary format. Therefore, you can use it to interact with Redis (v5) as well.

The repo is available at https://github.com/gyson/hll

P.S. During development, I found that binary pattern matching in Elixir/Erlang is pretty sweet for binary encoding and decoding !

7 Likes

Any particular reason for using Erlang records instead of Elixir structures?

1 Like

It’s minor, using tuple/records would have slightly better performance (about 3~4% in my simple benchmarks) for this use case.

3 Likes

Records are used in the new MyXQL adapter for much the same reason (although the focus is on memory usage): http://blog.plataformatec.com.br/2018/12/building-a-new-mysql-adapter-for-ecto-part-ii-encoding-decoding/

3 Likes