wmnnd
Stable Partitioning Algorithm With No Side Effects
Hey folks,
I need to implement some sort of partitioning of data records. A certain part of records should be processed in one way, another part in another way. So I thought it would be nice to implement a stable partitioning algorithm that splits datasets into a given number of partitions with given proportions.
So, given three partitions and their relative thresholds
[a: 0, b: 0.33, c: 0.66]
I want to be able to take a term, pipe it into a partitioning function and receive the name of the partition:
partition("foo", partitions: partitions) # => :a
partition("bar", partitions: partitions) # => :c
partition("foobar", partitions: partitions) # => :b
My idea is to use a simple hash function to generate a uniformly distributed stable pseudo-random number for any term and then pick the suitable partition.
Here’s what I’ve come up. I’m really curious to see what you think of the approach or if you have ideas for better/faster/more elegant solutions.
defmodule Partitioner do
@spec partition(term :: []) :: integer()
def partition(term, opts \\ []) do
term
|> normalize(opts)
|> do_partition(opts)
end
defp normalize(term, opts) do
salt = Keyword.get(opts, :salt, nil)
{term, salt}
|> :erlang.term_to_binary()
|> do_hash()
|> :binary.first()
|> do_normalize()
end
defp do_hash(binary), do: :crypto.hash(:sha, binary)
defp do_normalize(int), do: int / 255
defp do_partition(float, opts) do
opts
|> Keyword.get(:partitions, a: 0, b: 0.5)
|> Enum.max_by(&match_partition(float, &1))
|> elem(0)
end
defp match_partition(float, {_, threshold}) when float > threshold, do: threshold
defp match_partition(_, _), do: 0
end
Marked As Solved
benwilson512
Sounds like a job for erlang — OTP 29.0.2 (erts 17.0.2)
iex(10)> partitions = [:a, :b, :c]
[:a, :b, :c]
iex(11)> ["foo", "bar", "foobar"] |> Enum.map(&:erlang.phash2(&1, 2)) |> Enum.map(&Enum.at(partitions, &1))
[:a, :a, :b]
Also Liked
wmnnd
That’s awesome, I was half-expecting to find a function like this in the Erlang standard library ![]()









