Talán - probabilistic data structures
Talán is a Hungarian adverb meaning: maybe, perhaps, probably.
This library currently has 3 probablistic data structures:
- A bloom filter (inspired by the blex library but with a few more features.
- A counting bloom filter
- A probabilistic linear counter for cardinality estimation
All the above are built on :atomics
for concurrent accessibility and speed.
GitHub: https://github.com/preciz/talan
Hex: https://hex.pm/packages/talan
Talan.Stream.uniq/2
The library also contains a module called Talan.Stream
which has a uniq/2
function.
With this you can build probabilistically uniq streams.
list = ["a", "b", "c", "a", "b", "c"]
bloom_filter = Talan.BloomFilter.new(100_000, false_positive_probability: 0.001)
Talan.Stream.uniq(list, bloom_filter) |> Enum.to_list
["a", "b", "c"]
The stream never returns duplicate elements but it sometimes detects false positive duplicates depending on the bloom filter it uses. False positives are faulty duplicate detections that get rejected.
Talan.BloomFilter
- Fixed size Bloom filter
- Concurrent reads & writes
- Custom & default hash functions
- Merge multiple Bloom filters into one
- Intersect multiple Bloom filters into one
- Estimate number of unique elements
- Estimate current false positive probability
Examples
iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.member?("Barna")
true
iex> b |> Talan.BloomFilter.member?("Kovacs")
false
Talan.CountingBloomFilter
This is based on the BloomFilter module and adds a custom sized counter for every bit.
Counting bloom filters support probabilistic deletion
of elements but have higher memory consumption because
they need to store a counter of N bits for every bloom filter bit.
Examples
cbf = Talan.CountingBloomFilter.new(10_000)
cbf |> Talan.CountingBloomFilter.put("hat")
cbf |> Talan.CountingBloomFilter.put("hat")
cbf |> Talan.CountingBloomFilter.put("phone")
cbf |> Talan.CountingBloomFilter.count("hat")
2
cbf |> Talan.CountingBloomFilter.count("phone")
1
Talan.Counter
Linear probabilistic counter implementation for cardinality estimation.
Examples
c = Talan.Counter.new(10_000)
c |> Talan.Counter.put(["you", :can, Hash, {"any", "elixir", "term"}])
c |> Talan.Counter.put(["you", :can, Hash, {"any", "elixir", "term"}])
c |> Talan.Counter.cardinality()
1
c |> Talan.Counter.put("more")
c |> Talan.Counter.cardinality()
2
Feedback is welcome.
(I would like to also mention that I’m looking for employment, if you are an employer or your company is hiring I would be happy to talk with you. PM me. Thank you!)