Eager evaluation and optimizing computation

c13e · September 19, 2021, 7:13pm

I’m learning Haskell at the moment and contrasting its semantics with Elixir as I go.

In Haskell, if I create the following naive & inefficient prime number checker:

factors n = [x | x <- [1..n], n `mod` x == 0]
prime n = factors n == [1, n]

and run it against a large number like 10^9, it returns False immediately thanks to lazy evaluation as soon as a mismatch occurs.

> prime 1000000000
False

If I do the same in Elixir, the eager evaluation will try to compute all the factors of 10^9 first, which is very slow with this implementation.

factor = &(for x <- 1..&1, rem(&1, x) == 0, do: x)
prime = &(factor.(&1) == [1, &1])
prime.(1000000000)
# hangs

The two evaluation strategies make sense to me but I somehow still find it surprising that the BEAM runtime can’t optimize away the matching against 2-element list and instead carries on the computation past the (sufficient) first 2 evaluations of factor.(1000000000).

I guess despite the eager evaluation, I was still expecting some sort of treacherous optimization from the BEAM

Can someone enlighten me as to why this kind of constrained optimization can’t or doesn’t happen in languages with eager evaluation? Thanks!

Qqwy · September 19, 2021, 7:37pm

Here is the code one more time, but now as more idiomatic Elixir with named functions in a module (rather than anonymous functions in IEx).

defmodule StrictExample do
  def factor(val) do
    for x <- 1..val, rem(val, x) == 0 do
      x
    end
  end

  def prime(val) do
    factor(val) == [1, val]
  end

  def is_large_number_prime() do
    prime(1000000000)
  end
end

It indeed has to do with the order of evaluation.
Haskell, being powered by a STG (Spineless Tagless graph-reduction machine), will move execution from the ‘goal’ back to where it originates.

In this example, it will indeed try to match [1, n] with the outcome of factors n. Indeed because of laziness is Haskell able to stop once two elements have been read from the output list of factor.

However, on the BEAM, we do not have this context-switching between looking at multiple parts of the output. Instead, for an operation like == we first evaluate both operands, and then check whether they are equal.

The optimization you expect here is (in eager languages) both rather limited in usefulness for practical applications, as well as rather high-level in the sense that as a human it seems obvious but for a compiler it would require many steps of intermediate reasoning.

To make an explicitly lazy implementation in Elixir, you can do this:

defmodule LazyExample do
  def factor(val) do
    # A lazy enumeration of all prime factors of `val`
    Stream.filter(1..val, &rem(val, &1) == 0)
  end

  def prime(val) do
    # We force the stream here, but only look at the first two elements.
    Enum.take(factor(val), 2) == [1, val]
  end

  def is_large_number_prime() do
    prime(1000000000)
  end
end

In this implementation, prime(1000000000) will indeed execute very fast.

c13e · September 19, 2021, 7:58pm

Thank you so much for the clear explanation!

It makes more sense now given that such context switching strategy doesn’t happen in the BEAM or play well with eager evaluation in general.

Haskell’s STG sounds fascinating, will look into it

kip · September 19, 2021, 11:26pm

For those like my treading this path for the first time, here’s a good starting point: https://www.microsoft.com/en-us/research/wp-content/uploads/1992/04/spineless-tagless-gmachine.pdf