Weight-based random sampling

tombh · June 23, 2019, 3:54pm

I’m new to Elixir, and relatively new to functional programming. I’ve just written this little algorithm to take a weighted sample from a list of tuples;

# items = [apples: 10, oranges: 30, bananas: 60]
defp weight_based_choice(items) do
    max = sum_weights(items)

    state = %{
      current: 0,
      choice: nil,
      random_value: Enum.random(1..max)
    }

    Enum.reduce(items, state, fn item, state ->
      weight = elem(item, 1)
      state = %{state | current: state[:current] + weight}

      choice =
        if(state[:random_value] <= state[:current]) do
          elem(item, 0)
        else
          nil
        end

      if state[:choice] == nil && choice != nil do
        %{state | choice: choice}
      else
        state
      end
    end)[:choice]
  end

  defp sum_weights(items) do
    Enum.reduce(items, 0, fn item, acc ->
      weight = elem(item, 1)
      acc + weight
    end)
  end

Are there more idiomatic ways to do this?

peerreynders · June 23, 2019, 4:40pm

Welcome to the forum!

Not necessarily idiomatic, just a different way of doing things:

defmodule Demo do
  def weight_based_choice(items) do
    value =
      items
      |> sum_weights(0)
      |> :rand.uniform()

    select_item(items, value, 0)
  end

  defp select_item([{name, _}], _, _) do
    name
  end

  defp select_item([{name, weight} | tail], value, acc) do
    acc = acc + weight

    if acc > value do
      name
    else
      select_item(tail, value, acc)
    end
  end

  defp sum_weights([], sum),
    do: sum

  defp sum_weights([{_, weight} | tail], sum),
    do: sum_weights(tail, sum + weight)
end

items = [apples: 10, oranges: 30, bananas: 60]

IO.inspect(Demo.weight_based_choice(items))

NobbZ · June 23, 2019, 5:05pm

Can’t we write select_item/3 roughly like this, making it effectively a select_item/2?

defp select_item([{name, weight} | _], idx) when idx < weight, do: name
defp select_item([{_, weight} | tail], idx), do: select_item(tail, idx - weight)

While also making use of Enum.reduce/3 in sum_weights/2 we can make it a sum_weights/1:

defp sum_weights(items) do
  Enum.reduce(items, 0, fn {_, weight}, sum -> weight + sum end)
end

Then we leave weight_based_choice nearly untouched (only adjusting calls to the functions we changed):

defmodule Demo do
  def weight_based_choice(items) do
    value =
      items
      |> sum_weights()
      |> :rand.uniform()

    select_item(items, value)
  end

  defp select_item([{name, weight} | _], idx) when idx < weight, do: name
  defp select_item([{_, weight} | tail], idx), do: select_item(tail, idx - weight)

  defp sum_weights(items) do
    Enum.reduce(items, 0, fn {_, weight}, sum -> weight + sum end)
  end
end

 1..10 |> Enum.map(fn _ -> Demo.weight_based_choice(items) end) |> IO.inspect
#=> [:bananas, :bananas, :bananas, :bananas, :bananas, :bananas, :bananas, :bananas, :oranges, :bananas]

This is probably one of the most idiomatic approaches. We could even implement select_item/2 in terms of Enum.reduce_while/3, but usually this doesn’t add much in terms of readability, also not many people are used to that function…

defp select_item(items, idx) do
  Enum.reduce_while(items, idx, fn
    {_, weight}, acc when weight < acc -> {:cont, acc - weight}
    {name, weight}, acc -> {:halt, name}
  end
end

PS: Perhaps we need to adjust </<=/>/>= in the guards… I’m writing this mostly freehand…

peerreynders · June 23, 2019, 6:35pm

So

defmodule Demo do
  def weight_based_choice(items) do
    value =
      items
      |> Enum.reduce(0, &acc_weight/2)
      |> :rand.uniform()

    select_item(items, value)
  end

  # base cases
  defp select_item([{name, _}], _),
    do: name

  defp select_item([{name, weight} | _], index) when weight >= index,
    do: name

  # recursive case
  defp select_item([{_, weight} | tail], index),
    do: select_item(tail, index - weight)

  defp acc_weight({_, weight}, total),
    do: total + weight

end

items = [apples: 10, oranges: 30, bananas: 60]

items
|> Demo.weight_based_choice()
|> inspect()
|> IO.puts()

or

defmodule Demo do
  def weight_based_choice(items) do
    marker =
      items
      |> Enum.reduce(0, &acc_weight/2)
      |> :rand.uniform()

    Enum.reduce_while(items, marker, &item_below_marker/2)
  end

  defp item_below_marker({_name, weight}, acc) when weight < acc,
    do: {:cont, acc - weight}

  defp item_below_marker({name, _weight}, _acc),
   do: {:halt, name}

  defp acc_weight({_, weight}, total),
    do: total + weight

end

items = [apples: 10, oranges: 30, bananas: 60]

items
|> Demo.weight_based_choice()
|> inspect()
|> IO.puts()

While also making use of Enum.reduce/3

People tend to have familiarity with reduce from other languages - when learning I feel that resorting to Enum immediately is like heading straight for the power-tools even for the smallest job. To a certain degree choosing reduce when a simple recursive function will do feels a bit like “premature pessimization” especially as recursion is how iteration is accomplished in Elixir/Erlang - but I know I’m in the minority there.

gregvaughn · June 23, 2019, 6:40pm

That’s a fun little problem. Here’s my take, from playing around in iex

iex(109)> items = [apples: 10, oranges: 30, bananas: 60]
[apples: 10, oranges: 30, bananas: 60]
iex(110)> accumulated_weights = Enum.scan(items, fn {k, w}, {_, w_sum} -> {k, w + w_sum} end)
[apples: 10, oranges: 40, bananas: 100]
iex(111)> {_, max} = List.last(accumulated_weights)
{:bananas, 100}
iex(112)> random_value = Enum.random(1..max)
17
iex(113)> sample = Enum.reduce_while(accumulated_weights, random_value, fn {k, w}, r -> if r <= w, do: {:halt, k}, else: {:cont, r} end)
:oranges

I believe this is the first time I’ve used Enum.scan. It lets us get that accumulated_weights format, which is handy for reuse if we will be getting more random samples from the same list of inputs.

We get our max by pattern matching on the last item of accumulated_weights

Enum.reduce_while then halts iteration when it has found the proper match. It might make some static type people twitch because when continuing the random_value is kept as the accumulator, but when halting, the keyword becomes the accumulator that is returned from the overall iteration.

gregvaughn · June 23, 2019, 7:04pm

If I were to really use this in production, I’d likely wrap it in its own struct module like this:

defmodule WeightedList do
  defstruct [:acc_list, :max]

  def new(kw_list) do
  # Might want to verify it is sorted lowest to highest weight
  acc_list = Enum.scan(kw_list, fn {k, w}, {_, w_sum} -> {k, w + w_sum} end)
  {_, max} = List.last(acc_list)
  %__MODULE__{acc_list: acc_list, max: max}
  end

  def sample(%__MODULE__{acc_list: acc_list, max: max}) do
    rand_value = Enum.random(1..max)
    Enum.find_value(acc_list, fn
      {k, w} when rand_value <= w -> k
      _ -> false
    end)
  end
end

Which could then be used in calling code like

wl = WeightedList.new(apples: 10, oranges: 30, bananas: 60)
# now let's get 10 random samples
samples = for _ <- 1..10, do: WeightedList.sample(wl)

This represents a pattern I’ve found myself favoring lately. Create an optimized data structure for the sorts of functions you will want to use on it. The calling code then has a 2 step process: 1) create the struct, 2) call the function. This may seem unnecessary, but you can unit test step 1 easily since there are no side effects (or randomness).

Plus, having a struct allows you to participate in Protocols. I could imagine WeightedList implementing Inspect and perhaps even Collectable if it’s an important enough part of your system.

Edited to use Enum.find_value instead of Enum.reduce_while which I find clearer for this situation

NobbZ · June 23, 2019, 8:08pm

I do see it quite the opposite. reduce/fold or however you might call it, is a well understood pattern in FP, using it should happen from the muscle memory.

Directly writing out the recursion though, is in my opinion not as readable as a clearly named reduce. To do the reverse mapping, I have to understand the full recursive function to see, whether it reduces a list to a single value, if it maps a function over the list, or does even flatten things out on the way…

With Enum.x I clearly see the intend.

Also, as remote calls are more expensive than local ones on the BEAM, writing out the recursion is actually the optimisation…

Of course, one should be able to implement some functions in Enum on their own (for lists at least). map, reduce, reverse, concat, those are my personal favorites…

peerreynders · June 23, 2019, 8:30pm

FYI …

Of course, one should be able to implement some functions in Enum on their own (for lists at least). map , reduce , reverse , concat , those are my personal favorites …

When learning to a certain degree it can be helpful to make Enum off-limits until recursion (body and tail) is well understood - otherwise Enum can be come a crutch.

NobbZ · June 23, 2019, 8:33pm

Yes. I sign this. During learning you are totally correct.

But the question was about idiomatic code, not what would I write when I learn about recursion. And using Enum and Stream is by far the most idiomatic way dealing with lists and other “iterables”.

dimitarvp · June 23, 2019, 9:17pm

I side with you. I see @peerreynders’s point but I’ve done my exercises and re-implemented most of Enum.

For such problems, start with idiomatic Elixir code and only go for hand-rolled algorithms if performance is of utmost importance.

tombh · June 25, 2019, 7:55am

Thank you so much everyone! This is way more than I expected. It’s a lot to digest, I’ve learned so much. I like @gregvaughn’s suggestion the most, not because of the struct idea, but because it seems to me to realise the thought process in my original approach, except without all the noobish bloat. The struct idea is also fascinating as well of course.

A quick question about differentiating functions by arity, like this:

This of course seems to be quite idiomatic, but I’m struggling with reaching for it myself. I just feel that if a function does something different it should have a different name. Perhaps naively I think it’s better to make the condition explicit in say a case statement, and then call out to differently named functions just so that the intent of the code becomes more intuitive. Thoughts?

LostKobrakai · June 25, 2019, 8:06am

I’d say both select an item. So they’re doing the same job. Just some inputs might recurse a few more times than others to get to an answer.

NobbZ · June 25, 2019, 8:26am

As @LostKobrakai already pointed out, both do the same thing. They select the item from the list of items.

The multiheaded function you see there, has nothing to do with “differentiating functions by arity”, in fact, nothing is differentiated there, both are the same function, as both have the same name and the same arity.

What though I do use there is “pattern matching”, something very important to learn and understand when writing elixir.

To be honest, those 2 function clauses are semantically equivalent with the following single-clause function:

defp select_item(items, idx) do
  case items do
    [{name, weight}|_] when idx < weight -> name
    [{_, weight} | tail] -> select_item(tail, idx - weight)
  end
end

But it is idiomatic to move such a case which matches on an argument directly and spans the full function body into a multi clause function.

Also in general pattern matching in the functions head or a case is usually prefered of using if and some predicate functions and/or using getter functions.

LostKobrakai · June 25, 2019, 8:37am

Not only semantically. Afaik the compiler does take functions with many function heads and makes them be a single function with a case statement inside.

NobbZ · June 25, 2019, 8:38am

I’m not sure if elixir does this already, but yes, at least for the core-erlang representation it is necessary to do so, as core erlang only has single clause functions IIRC.

peerreynders · June 25, 2019, 11:02am

Pattern matching‡ is the conditional construct - the case expression is just one way of utilizing it.

‡ Not to be confused with destructuring:

Kernel.destructure/2
JavaScript destructuring assignment

tombh · June 28, 2019, 5:52am

That really helped clarify it for me, thank you