How would I start figuring out how to translate highly data-dependent functions into Elixir?

James_E · March 20, 2022, 6:20am

Say I’ve got the following function fragment:

# From page 4 of https://doi.org/10.1063/1.1751381
# > The global decoding has implicitly done too many exchanges and inversions,
# > so some of them need to be undone.
# > A single backwards pass through the higher order n * (order - 1) bits of the
# > integers does the exchanges and inversions needed to transform them into the
# > desired coordinates.
def f(x, order):
	# Modifies x in-place
	ndims = len(x)
	for r in reversed(range(order - 1)):
		_high_bit = 1 << r
		_low_mask = _high_bit - 1
		for i in reversed(range(ndims)):
			if not (x[i] & _high_bit):
				# exchange low bits of x[i] and x[0]
				x[0], x[i] = (x[0] &~ _low_mask) | (x[i] & _low_mask), (x[i] &~ _low_mask) | (x[0] & _low_mask)
			else:
				# invert low bits of x[0]
				x[0] ^= _low_mask

(The above is just a pseudocode instantiation from the paper; a real-world implementation can be found in this Python code.)

In any case, this is two nested for-loops, that go around and hit multiple bits of this list of integers every iteration. I am having an extremely hard time finding a way to beat this into the FP paradigm.

Nothing in the Elixir School beginner docs seemed to contain anything along the lines of “how to think about” how to translate stuff like this into Elixir. My next course of action is to take SICP, but until I finish that (quite lengthy and difficult) book-course, I’m somewhat dead in the water on my attempt to translate the above-linked algorithm into Elixir. The only insight I’ve got so far is that I should probably treat x_0 specially, since only it and x_i are modified each iteration.

Does anyone have recommendations on courses, explainers, lectures, or even just a specifically relevant SICP chapter, for instance, that would get me started on translating such highly-data-dependent algorithms into “functionalese”?

James_E · March 20, 2022, 9:43am

Currently, I’m attempting (*without success—logic errors remain) to use/abuse Stream.scan for this:

defmodule HilbertCurve do
  use Bitwise

  def point(d, order, ndims \\ 2) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L115-L149
    # side_length = 2 ** order
    # volume = side_length ** ndims
    d
    |> transpose(order, ndims)
    |> gray_decode()
    |> undo_excess_work(order)
  end

  def dist(x, order) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L201-L241
    x
    |> inverse_undo_excess_work(order)
    |> gray_encode()
    |> untranspose(order)
  end

  def gray_encode(x) do
    # https://en.wikipedia.org/wiki/Gray_code?oldid=1077948578#Constructing_an_n-bit_Gray_code
    Enum.map(Stream.zip(x, [0 | x]), fn {a, b} -> Bitwise.bxor(a, b) end)
  end

  def gray_decode(x) do
    # https://en.wikipedia.org/wiki/Gray_code?oldid=1077948578#Constructing_an_n-bit_Gray_code
    Enum.scan(x, 0, &Bitwise.bxor/2)
  end

  def transpose(i, order, ndims) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L85-L97
    #TODO: figure out what this function does
    i
    |> to_bits(order * ndims)
    |> Stream.chunk_every(ndims)
    |> Stream.zip()
    |> Stream.map(&Tuple.to_list/1)
    |> Enum.map(&from_bits/1)
  end

  def untranspose(x, order) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L100-L112
    #TODO: figure out what this function does
    x
    |> Stream.map(fn i -> to_bits(i, order) end)
    |> Stream.zip()
    |> Enum.map(&Tuple.to_list/1)
    |> List.flatten()
    |> from_bits()
  end

  def undo_excess_work(x, order) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L134-L147
    # ndims = length(x)
    {_, x} = scan_last(
      ((order - 2)..0),
      {hd(x), x},
      fn cur, acc ->
        # The first field of `acc` is the x_0 accumulator
        # The second field of `acc` is the current value of `x`
        r = cur
        {x_0, x} = acc

        {list__x_0, reversed_list__x_i} = scan_unzip(
          Enum.reverse(x),
          {x_0, nil},
          fn cur, acc ->
            # The first field of `acc` is the x_0 accumulator
            # The second field of `acc` is the updated value of `x[i]`
            x_i = cur
            {x_0, _} = acc

            uew_op(r, x_i, x_0)
          end
        )

        x_0 = List.last(list__x_0)
        x = Enum.reverse(reversed_list__x_i)

        {x_0, x} # return
      end
    )

    x # return
  end

  def uew_op(r, x_i, x_0) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L139-L146
    # Returns updated values of {x_0, x_i}
    bit_r = 1 <<< r
    low_bits = bit_r - 1
    if (x_i &&& bit_r) == 0 do
      swap_bits({x_0, x_i}, low_bits)
    else
      {invert_bits(x_0, low_bits), x_i}
    end
  end

  def invert_bits(i, mask \\ -1) do
    Bitwise.bxor(i, mask)
  end

  def swap_bits({a, b}, mask \\ -1) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L143-L146
    t = Bitwise.bxor(a, b) &&& mask
    a = Bitwise.bxor(a, t)
    b = Bitwise.bxor(b, t)
    {a, b}
  end

  def inverse_undo_excess_work(_x, _order) do
    # https://github.com/galtay/hilbertcurve/blob/v2.0.5/hilbertcurve/hilbertcurve.py#L215-L226
    :TODO
  end

  # def _to_bits(i, width) do
  #   <<i::size(width)>>
  # end

  def scan_last(iterable, initial_acc, f) do
    Stream.scan(iterable, initial_acc, f)
    |> Enum.take(-1)
    |> hd()
  end

  def scan_unzip(iterable, initial_acc, f) do
    Stream.scan(iterable, initial_acc, f)
    |> Enum.unzip()
  end

  def to_bits(i, width) do
    i
    |> to_bitstring(width)
    |> String.to_charlist()
    |> Enum.map(&(&1 - hd '0'))
  end

  def to_bitstring(i, width \\ 0) do
    Integer.to_string(i, 2)
    |> String.pad_leading(width, "0")
  end

  # def from_bits(bs) when is_bitstring(bs) do
  #   padding_size = 7 - rem(:erlang.bit_size(bs) + 7, 8)
  #   padding = <<0::size(padding_size)>>
  #   :binary.decode_unsigned(<< padding, bs :: bitstring >>)
  # end

  def from_bits(l) when is_list(l) do
    Enum.map(l, &(&1 + hd '0'))
    |> List.to_string()
    |> from_bitstring()
  end

  def from_bitstring(s) do
    Integer.parse(s, 2)
    |> (fn {i, ""} -> i end).()
  end

end

#IO.puts("The following should be 1234567:")
#IO.inspect(HilbertCurve.dist([6, 1, 6, 5, 6, 3, 6], 4))

IO.puts("The following should be [1, 2, 3, 4, 5, 6, 7]:")
IO.inspect(HilbertCurve.point(2000640, 4, 7))

My main worry is that this isn’t “in the spirit of” how functional programming is supposed to be done—I feel like I’m just trying to blindly imitate the for loops rather than getting at what’s actually happening at the math level—and that my wrong perspective in the translation is what’s ultimately leading to my difficulties in getting the logic errors out.

(edit: I do intend to mark the internal functions private once it’s all working, but marking them all as public is extremely handy for debugging)

cmo · March 20, 2022, 11:02am

Usually for loops can be mapped across to Elixir with Enum.reduce or recursion.

quda · March 20, 2022, 11:23am

I don’t think Elixir is the most suitable language for algorithm implementation.
Nothing can beat C and Fortran for such tasks in terms of performance/versatility/speed of execution.
Nowadays, maybe only Julia can match these two.

kokolegorille · March 20, 2022, 11:24am

Hello and welcome,

This code is using mutable data, and arrays… Arrays are not available in standard lib.

This is not possible… but You can do

def f(x, order) do
  new_x = ...
  new_x
end

Anything using x[0] is not available, there are no arrays and accessing an element at a given position is expensive.

As mentionned by @cmo, You can do it with recursion, or Enum.reduce, or list comprehension.

What You need to do is take those mutable variables, and pass them as state, or accumulator.

James_E:

		for i in reversed(range(ndims)):
			if not (x[i] & _high_bit):
				# exchange low bits of x[i] and x[0]
				x[0], x[i] = (x[0] &~ _low_mask) | (x[i] & _low_mask), (x[i] &~ _low_mask) | (x[0] & _low_mask)
			else:
				# invert low bits of x[0]
				x[0] ^= _low_mask

For example, this pseudo code

x
|> Enum.reverse()
|> Enum.reduce([], fn xi, [x0 | rest] -> 
  ...
end)

dimitarvp · March 20, 2022, 11:33am

Answering your title: you would first need to break apart the Python code into smaller sub-functions, starting from the innermost loops. After you have done that a few times your higher-level function will look much clearer. The translation then will be more manageable.

As another commenter said, Elixir is not a good fit for algorithms that heavily utilize pointer/array access and modifications in place, but it’s also true that a lot can be done to translate the algorithms themselves to be FP-friendly.

And while it’s true that the official Elixir tutorial doesn’t prepare you for such tasks, Exercism’s Enum track will definitely open your eyes about how things are done in FP land. It’s a different way to think about moving and transforming data that still delivers 99% of the results of the imperative algorithms.

I don’t do a lot of algorithms in my work but I’ve always needed the occasional one every now and then and so far have never stumbled upon one that I can’t translate adequately to Elixir.

al2o3cr · March 20, 2022, 4:51pm

For a literal translation, Nx could be helpful since it provides constant-time arrays of fixed numerical type.

A good reference (it’s specifically called out in the documentation for Erlang’s :queue module) is “Purely Functional Data Structures” by Chris Okasaki

This diagram from the paper makes me think there might be a solution involving Stream.cycle and Stream.zip plus recursion:

Screen Shot 2022-03-20 at 12.44.54 PM

The paper describes this approach leading to “awkward computer code” - and the bookkeeping for multiple cycles is complicated in C, but I suspect a lot of that vanishes inside of the Stream.zip machinery in Elixir.

A final alternative: write the algorithm from the paper in Rust and link it as an NIF. That’s not “functional” at all, but it’s a very BEAM-flavored approach to Getting Work Done.

James_E · March 22, 2022, 4:56am

Hah, funny you mention that language specifically – this particular algorithm has a Rust implementation already extant… if I don’t crack it soon, I’ll look into just NIF-ing/FFI-ing it.