Help with flattening a map - memory issues

arjan · June 24, 2019, 7:38pm

I made an algorithm to flatten a map into a single keypath => value map, based on @benwilson512’s code from this topic.

However when I run it on a medium-sized map (about 150 keys in total, up to 5 levels in map nesting), the beam blows up due to memory errors.

I must be doing very stupid but I don’t see directly what’s wrong…

defmodule FlatMap do
  def flatten(map, opts \\ [join: true])

  def flatten(map, opts) when is_map(map) do
    map
    |> Map.to_list()
    |> do_flatten([], [])
    |> Enum.map(fn {k, v} ->
      k = Enum.reverse(k)

      case opts[:join] do
        true -> {Enum.join(k, "."), v}
        false -> {k, v}
      end
    end)
    |> Map.new()
  end

  defp do_flatten([], acc, _), do: List.flatten(acc)

  defp do_flatten([{k, v} | rest], acc, path) when is_map(v) do
    v =
      v |> Map.to_list()

    flattened_subtree = do_flatten(v, acc, [k | path])

    do_flatten(rest, [flattened_subtree, acc], path)
  end

  defp do_flatten([{k, v} | rest], acc, path) do
    do_flatten(rest, [{[k | path], v} | acc], path)
  end
end

Obviously this algorithm is not tailcall optimized but for my limits that should not matter that much… or does it?

NobbZ · June 24, 2019, 8:04pm

Tail call does actually matter. You can’t GC what is on the heap, so when you don’t return, the stack won’t shrink.

Said this, I did not actually look at your code if that is the main problem or if I could identify another one.

michalmuskala · June 24, 2019, 8:28pm

Could it be that it’s just a lot of data? My quick, back-of-the-envelope calculation tells me that with 150 keys per level and 5 levels assuming just 1 word for each value you’d be at over 500 GB.

arjan · June 24, 2019, 8:35pm

No it’s actually very little data… I’ll try to come up with a self-contained example.

arjan · June 24, 2019, 9:07pm

I passed acc into both recursion invocations which caused it to blow up exponentionally.

The line that unflattened the children should have read:

flattened_subtree = do_flatten(v, [], [k | path])