Talan library - probabilistic data structures with concurrent accessibility based on atomics

Talán - probabilistic data structures

Talán is a Hungarian adverb meaning: maybe, perhaps, probably.

This library currently has 3 probablistic data structures:

  • A bloom filter (inspired by the blex library but with a few more features.
  • A counting bloom filter
  • A probabilistic linear counter for cardinality estimation

All the above are built on :atomics for concurrent accessibility and speed.

GitHub: https://github.com/preciz/talan
Hex: https://hex.pm/packages/talan

Talan.Stream.uniq/2

The library also contains a module called Talan.Stream which has a uniq/2 function.
With this you can build probabilistically uniq streams.

list = ["a", "b", "c", "a", "b", "c"]

bloom_filter = Talan.BloomFilter.new(100_000, false_positive_probability: 0.001)

Talan.Stream.uniq(list, bloom_filter) |> Enum.to_list
["a", "b", "c"]

The stream never returns duplicate elements but it sometimes detects false positive duplicates depending on the bloom filter it uses. False positives are faulty duplicate detections that get rejected.

Talan.BloomFilter

  • Fixed size Bloom filter
  • Concurrent reads & writes
  • Custom & default hash functions
  • Merge multiple Bloom filters into one
  • Intersect multiple Bloom filters into one
  • Estimate number of unique elements
  • Estimate current false positive probability

Examples

iex> b = Talan.BloomFilter.new(1000)
iex> b |> Talan.BloomFilter.put("Barna")
iex> b |> Talan.BloomFilter.member?("Barna")
true
iex> b |> Talan.BloomFilter.member?("Kovacs")
false

Talan.CountingBloomFilter

This is based on the BloomFilter module and adds a custom sized counter for every bit.

Counting bloom filters support probabilistic deletion
of elements but have higher memory consumption because
they need to store a counter of N bits for every bloom filter bit.

Examples

cbf = Talan.CountingBloomFilter.new(10_000)
cbf |> Talan.CountingBloomFilter.put("hat")
cbf |> Talan.CountingBloomFilter.put("hat")
cbf |> Talan.CountingBloomFilter.put("phone")
cbf |> Talan.CountingBloomFilter.count("hat")
2
cbf |> Talan.CountingBloomFilter.count("phone")
1

Talan.Counter

Linear probabilistic counter implementation for cardinality estimation.

Examples

c = Talan.Counter.new(10_000)
c |> Talan.Counter.put(["you", :can, Hash, {"any", "elixir", "term"}])
c |> Talan.Counter.put(["you", :can, Hash, {"any", "elixir", "term"}])
c |> Talan.Counter.cardinality()
1
c |> Talan.Counter.put("more")
c |> Talan.Counter.cardinality()
2

Feedback is welcome. :wave:

(I would like to also mention that I’m looking for employment, if you are an employer or your company is hiring I would be happy to talk with you. PM me. Thank you!)

11 Likes