# Weight-based random sampling

I’m new to Elixir, and relatively new to functional programming. I’ve just written this little algorithm to take a weighted sample from a list of tuples;

``````# items = [apples: 10, oranges: 30, bananas: 60]
defp weight_based_choice(items) do
max = sum_weights(items)

state = %{
current: 0,
choice: nil,
random_value: Enum.random(1..max)
}

Enum.reduce(items, state, fn item, state ->
weight = elem(item, 1)
state = %{state | current: state[:current] + weight}

choice =
if(state[:random_value] <= state[:current]) do
elem(item, 0)
else
nil
end

if state[:choice] == nil && choice != nil do
%{state | choice: choice}
else
state
end
end)[:choice]
end

defp sum_weights(items) do
Enum.reduce(items, 0, fn item, acc ->
weight = elem(item, 1)
acc + weight
end)
end
``````

Are there more idiomatic ways to do this?

1 Like

Welcome to the forum!

Not necessarily idiomatic, just a different way of doing things:

``````defmodule Demo do
def weight_based_choice(items) do
value =
items
|> sum_weights(0)
|> :rand.uniform()

select_item(items, value, 0)
end

defp select_item([{name, _}], _, _) do
name
end

defp select_item([{name, weight} | tail], value, acc) do
acc = acc + weight

if acc > value do
name
else
select_item(tail, value, acc)
end
end

defp sum_weights([], sum),
do: sum

defp sum_weights([{_, weight} | tail], sum),
do: sum_weights(tail, sum + weight)
end

items = [apples: 10, oranges: 30, bananas: 60]

IO.inspect(Demo.weight_based_choice(items))

``````

Can’t we write `select_item/3` roughly like this, making it effectively a `select_item/2`?

``````defp select_item([{name, weight} | _], idx) when idx < weight, do: name
defp select_item([{_, weight} | tail], idx), do: select_item(tail, idx - weight)
``````

While also making use of `Enum.reduce/3` in `sum_weights/2` we can make it a `sum_weights/1`:

``````defp sum_weights(items) do
Enum.reduce(items, 0, fn {_, weight}, sum -> weight + sum end)
end
``````

Then we leave `weight_based_choice` nearly untouched (only adjusting calls to the functions we changed):

``````defmodule Demo do
def weight_based_choice(items) do
value =
items
|> sum_weights()
|> :rand.uniform()

select_item(items, value)
end

defp select_item([{name, weight} | _], idx) when idx < weight, do: name
defp select_item([{_, weight} | tail], idx), do: select_item(tail, idx - weight)

defp sum_weights(items) do
Enum.reduce(items, 0, fn {_, weight}, sum -> weight + sum end)
end
end

1..10 |> Enum.map(fn _ -> Demo.weight_based_choice(items) end) |> IO.inspect
#=> [:bananas, :bananas, :bananas, :bananas, :bananas, :bananas, :bananas, :bananas, :oranges, :bananas]
``````

This is probably one of the most idiomatic approaches. We could even implement `select_item/2` in terms of `Enum.reduce_while/3`, but usually this doesn’t add much in terms of readability, also not many people are used to that function…

``````defp select_item(items, idx) do
Enum.reduce_while(items, idx, fn
{_, weight}, acc when weight < acc -> {:cont, acc - weight}
{name, weight}, acc -> {:halt, name}
end
end
``````

PS: Perhaps we need to adjust `<`/`<=`/`>`/`>=` in the guards… I’m writing this mostly freehand…

1 Like

So

``````defmodule Demo do
def weight_based_choice(items) do
value =
items
|> Enum.reduce(0, &acc_weight/2)
|> :rand.uniform()

select_item(items, value)
end

# base cases
defp select_item([{name, _}], _),
do: name

defp select_item([{name, weight} | _], index) when weight >= index,
do: name

# recursive case
defp select_item([{_, weight} | tail], index),
do: select_item(tail, index - weight)

defp acc_weight({_, weight}, total),
do: total + weight

end

items = [apples: 10, oranges: 30, bananas: 60]

items
|> Demo.weight_based_choice()
|> inspect()
|> IO.puts()
``````

or

``````defmodule Demo do
def weight_based_choice(items) do
marker =
items
|> Enum.reduce(0, &acc_weight/2)
|> :rand.uniform()

Enum.reduce_while(items, marker, &item_below_marker/2)
end

defp item_below_marker({_name, weight}, acc) when weight < acc,
do: {:cont, acc - weight}

defp item_below_marker({name, _weight}, _acc),
do: {:halt, name}

defp acc_weight({_, weight}, total),
do: total + weight

end

items = [apples: 10, oranges: 30, bananas: 60]

items
|> Demo.weight_based_choice()
|> inspect()
|> IO.puts()
``````

While also making use of `Enum.reduce/3`

People tend to have familiarity with `reduce` from other languages - when learning I feel that resorting to `Enum` immediately is like heading straight for the power-tools even for the smallest job. To a certain degree choosing reduce when a simple recursive function will do feels a bit like “premature pessimization” especially as recursion is how iteration is accomplished in Elixir/Erlang - but I know I’m in the minority there.

That’s a fun little problem. Here’s my take, from playing around in iex

``````iex(109)> items = [apples: 10, oranges: 30, bananas: 60]
[apples: 10, oranges: 30, bananas: 60]
iex(110)> accumulated_weights = Enum.scan(items, fn {k, w}, {_, w_sum} -> {k, w + w_sum} end)
[apples: 10, oranges: 40, bananas: 100]
iex(111)> {_, max} = List.last(accumulated_weights)
{:bananas, 100}
iex(112)> random_value = Enum.random(1..max)
17
iex(113)> sample = Enum.reduce_while(accumulated_weights, random_value, fn {k, w}, r -> if r <= w, do: {:halt, k}, else: {:cont, r} end)
:oranges
``````

I believe this is the first time I’ve used `Enum.scan`. It lets us get that `accumulated_weights` format, which is handy for reuse if we will be getting more random samples from the same list of inputs.

We get our `max` by pattern matching on the last item of accumulated_weights

`Enum.reduce_while` then halts iteration when it has found the proper match. It might make some static type people twitch because when continuing the random_value is kept as the accumulator, but when halting, the keyword becomes the accumulator that is returned from the overall iteration.

If I were to really use this in production, I’d likely wrap it in its own struct module like this:

``````defmodule WeightedList do
defstruct [:acc_list, :max]

def new(kw_list) do
# Might want to verify it is sorted lowest to highest weight
acc_list = Enum.scan(kw_list, fn {k, w}, {_, w_sum} -> {k, w + w_sum} end)
{_, max} = List.last(acc_list)
%__MODULE__{acc_list: acc_list, max: max}
end

def sample(%__MODULE__{acc_list: acc_list, max: max}) do
rand_value = Enum.random(1..max)
Enum.find_value(acc_list, fn
{k, w} when rand_value <= w -> k
_ -> false
end)
end
end
``````

Which could then be used in calling code like

``````wl = WeightedList.new(apples: 10, oranges: 30, bananas: 60)
# now let's get 10 random samples
samples = for _ <- 1..10, do: WeightedList.sample(wl)
``````

This represents a pattern I’ve found myself favoring lately. Create an optimized data structure for the sorts of functions you will want to use on it. The calling code then has a 2 step process: 1) create the struct, 2) call the function. This may seem unnecessary, but you can unit test step 1 easily since there are no side effects (or randomness).

Plus, having a struct allows you to participate in Protocols. I could imagine `WeightedList` implementing `Inspect` and perhaps even `Collectable` if it’s an important enough part of your system.

Edited to use `Enum.find_value` instead of `Enum.reduce_while` which I find clearer for this situation

3 Likes

I do see it quite the opposite. `reduce`/`fold` or however you might call it, is a well understood pattern in FP, using it should happen from the muscle memory.

Directly writing out the recursion though, is in my opinion not as readable as a clearly named `reduce`. To do the reverse mapping, I have to understand the full recursive function to see, whether it reduces a list to a single value, if it maps a function over the list, or does even flatten things out on the way…

With `Enum.x` I clearly see the intend.

Also, as remote calls are more expensive than local ones on the BEAM, writing out the recursion is actually the optimisation…

Of course, one should be able to implement some functions in `Enum` on their own (for lists at least). `map`, `reduce`, `reverse`, `concat`, those are my personal favorites…

1 Like

FYI

Of course, one should be able to implement some functions in `Enum` on their own (for lists at least). `map` , `reduce` , `reverse` , `concat` , those are my personal favorites …

When learning to a certain degree it can be helpful to make `Enum` off-limits until recursion (body and tail) is well understood - otherwise `Enum` can be come a crutch.

Yes. I sign this. During learning you are totally correct.

But the question was about idiomatic code, not what would I write when I learn about recursion. And using `Enum` and `Stream` is by far the most idiomatic way dealing with lists and other “iterables”.

1 Like

I side with you. I see @peerreynders’s point but I’ve done my exercises and re-implemented most of `Enum`.

For such problems, start with idiomatic Elixir code and only go for hand-rolled algorithms if performance is of utmost importance.

Thank you so much everyone! This is way more than I expected. It’s a lot to digest, I’ve learned so much. I like @gregvaughn’s suggestion the most, not because of the struct idea, but because it seems to me to realise the thought process in my original approach, except without all the noobish bloat. The struct idea is also fascinating as well of course.

A quick question about differentiating functions by arity, like this:

This of course seems to be quite idiomatic, but I’m struggling with reaching for it myself. I just feel that if a function does something different it should have a different name. Perhaps naively I think it’s better to make the condition explicit in say a case statement, and then call out to differently named functions just so that the intent of the code becomes more intuitive. Thoughts?

2 Likes

I’d say both select an item. So they’re doing the same job. Just some inputs might recurse a few more times than others to get to an answer.

As @LostKobrakai already pointed out, both do the same thing. They select the item from the list of items.

The multiheaded function you see there, has nothing to do with “differentiating functions by arity”, in fact, nothing is differentiated there, both are the same function, as both have the same name and the same arity.

What though I do use there is “pattern matching”, something very important to learn and understand when writing elixir.

To be honest, those 2 function clauses are semantically equivalent with the following single-clause function:

``````defp select_item(items, idx) do
case items do
[{name, weight}|_] when idx < weight -> name
[{_, weight} | tail] -> select_item(tail, idx - weight)
end
end
``````

But it is idiomatic to move such a `case` which matches on an argument directly and spans the full function body into a multi clause function.

Also in general pattern matching in the functions head or a `case` is usually prefered of using `if` and some predicate functions and/or using getter functions.

2 Likes

Not only semantically. Afaik the compiler does take functions with many function heads and makes them be a single function with a case statement inside.

I’m not sure if elixir does this already, but yes, at least for the core-erlang representation it is necessary to do so, as core erlang only has single clause functions IIRC.

Pattern matching‡ is the conditional construct - the case expression is just one way of utilizing it.

‡ Not to be confused with destructuring:

2 Likes

That really helped clarify it for me, thank you