How to handle very big enumerables in comprehensions?

heves · October 14, 2022, 8:13pm

For a school assignment, I need to esentially brute-force check every possible solution for a puzzle. This is acceptable, as the task is meant to teach us about prefiltering and optimizing. I have done both of those as best as I could, and now my code will run out of memory for longer puzzles where the number of possible solutions can range from a couple hundred-thousand to a few millions.

# From a list of sublists, generate all possible permutations, using one element from each sublist
def perms_from_lists([]), do: []
def perms_from_lists(list) do
    Enum.reduce(list, fn sublist, acc ->
      for a <- acc, b <- sublist do
        [b | List.wrap(a)]
      end
    end)
    |> Enum.map(&Enum.reverse/1)
end

def solutions(puzzle) do
    # This would generate a lot of solutions, possibly (for longer puzzles) in the millions,
    # depending on how effective is my prefiltering on the given puzzle
    Enum.filter((for x <- perms_from_lists(transform_and_prefilter(puzzle)),
      do: if sol_okay?(puzzle, x), do: x), fn x -> x != nil end)
end

I’m guessing the problem may be that my program wants to store every value given by perms_from_lists() in the memory, but how do I prevent this?

Or maybe perms_from_lists() performs poorly for such large operations?

I have heard about Streams but haven’t used them, could this be possibly solved by them?

cmo · October 14, 2022, 8:30pm

Hi,

I’d suggest benchmarking different methods until you find one that works. You can use Benchee for that.

You might like to avoid putting the entire function on one long line. If you give things names and keep each line relatively short it is easier to read.

Did you by chance read the documentation for Stream? At the top or explains how to swap from Enum to Stream.

Source of this code: How to generate permutations from different sets of elements? - #12 by Eiji

derek-zhou · October 14, 2022, 8:33pm

I believe that for new user of Elixir (or any functional language for that matter), it is better not to learn list comprehension. Just use recursion (both body and tail), There is nothing you can do with list comprehension but not recursion.

dimitarvp · October 14, 2022, 8:48pm

Maybe you should show us the source of these so we can help you in detail.

Especially if the data itself is coming from a file or a DB or a network service, you can just ingest it in chunks – by using the Stream module – and your code will have predictable memory and CPU footprint.

gregvaughn · October 14, 2022, 8:59pm

The thing about for comprehensions is that they are eagerly evaluated, as you discovered. You are generating every possible solution before filtering whether they are acceptable solutions. (I also question whether you really need to reverse too).

Streams are one approach because they will lazily generate a candidate solution, but it will not be trivial to write the stream. Another approach is to figure out a way to call sol_okay within your reduce

trisolaran · October 14, 2022, 9:26pm

You are generating all possible permutations of your lists and checking them for some property, as you said, brute-force the check.

The space complexity of such task is exponential in the number of lists and I don’t see a way around that. Recursion, comprehension or what not won’t make any difference.

I think your options are to either deal with the increasing space requirements by swapping your permutations on disk, or understand the underlying structure of the solutions you’re looking for and try to generate only those. Knowing what transform_and_prefilter and sol_okay? do and the nature of this puzzle would help.

Eiji · October 14, 2022, 9:29pm

Here goes a simplified example:

defmodule Example do
  # in case where empty list is passed as an input
  def sample([]), do: []

  # in case first element is an empty list
  def sample([[] | tail]), do: sample(tail)

  # we are using list 2 times
  # the first would be iterated
  # the second is a copy for `sol_okay?/2` function
  def sample(list) do
    case sample(list, [], list) do
      [] -> {:error, :not_found}
      result -> result
    end
  end

  # in case we already have result skip everything else
  def sample(_list, {:ok, _result} = data, _list_copy), do: data

  # data here is finished element of output list
  # due to prepending the data is reversed
  # which is fixed by `:lists.reverse/1`
  def sample([], data, list_copy) do
    solution = :lists.reverse(data)
    if sol_okay?(list_copy, solution), do: {:ok, solution}, else: []
  end

  # when heads i.e. elements of first list ends
  # all we need to do is to return an empty list
  def sample([[] | _tail], _data, _list_copy), do: []

  # collecting data and recursion part
  def sample([[sub_head | head] | tail], data, list_copy) do
    # nested recursion
    new_data = sample(tail, [sub_head | data], list_copy)

    if new_data == [] do
      # solution not found, we are searching using tail recursion
      sample([head | tail], data, list_copy)
    else
      # solution is already found and there is no need to return it
      new_data
    end
  end

  def sol_okay?(_puzzle, [:b, 2]), do: true
  def sol_okay?(_puzzle, _solution), do: false
end

Why simplified? Well … first of all copying a full list for every processed combination is not the best idea. Perhaps you would like to store them somewhere, for example in :ets table. I just give you a simplified code which could be refactored by changing minimal amount of code and without any problem.

Also sol_okay?/2 function is naive as we have no idea what you want there and again for simplicity I made a simplest possible example i.e. pattern matching. However this function is still really useful. If you debug solution variable in 2nd clause you would find that the code is properly skipping other combinations when valid combination was found.

al2o3cr · October 14, 2022, 10:23pm

This seems like a good application for streams; they trade additional runtime complexity for memory usage. They can also be useful to skip unnecessary calculations by stopping the stream early, but that’s not applicable since you need every solution.

Here’s a version of your code above that uses Enum.filter all-at-once:

def solutions_enum(puzzle) do
  puzzle
  |> perms_from_lists()
  # in here: all the permutations are in memory
  |> Enum.filter(&sol_okay?(puzzle, &1))
end

and the same code but with streams:

def solutions_enum(puzzle) do
  puzzle
  |> stream_perms_from_lists()
  # in here: only one permutation is in memory at a time
  |> Stream.filter(&sol_okay?(puzzle, &1))
  # this accumulates all the results
  |> Enum.to_list()
end

The new stream_perms_from_lists/1 will involve a bunch of functions from Stream. For some inspiration, here’s a similar-but-different problem (generating permutations) solved with streams:

Eiji · October 14, 2022, 11:03pm

Nice, but my solution do almost the same (the only difference is that’s not Stream based) with just pattern-matching and I guess it would be much easier especially for new developers.

I think that Stream would be much more reasonable comparing to my solution if puzzle would be Stream as well from the first create/generate.

heves · October 15, 2022, 12:00am

Thank you all for your kind replies, I very much appreciate your time!

@dimitarvp The “data” here is a bunch of .txt files for puzzles, and the generated solution candidates. Think of it like solving Sudokus brute-force.

@gregvaughn

The thing about for comprehensions is that they are eagerly evaluated

I was suspicious, but I wasn’t sure until now, so thank you for that Makes sense now that it doesn’t run

(I also question whether you really need to reverse too)

I use the position of the elements as the link between tents and trees. It would be avoidable, but I’d try other stuff before rewriting a bunch of code to skip this reverse. If nothing else makes it faster, I’ll try it.
And also thanks for the suggestions, I will try them tomorrow!

@trisolaran

Knowing what transform_and_prefilter and sol_okay? do and the nature of this puzzle would help.

It’s a Tents and Trees puzzle solver.
transform_and_prefilter weeds out invalid tent positions to cut down on the permutations, it traverses a ~ 20x20 matrix of integer tuples once, shouldnt take that long.
sol_okay? checks if solution candidate (given by the direction each tent is compared to their tree, n/s/e/w) is valid. It’s against a few criteria, but returns as soon as one fails, I used throw-catch there. Shouldn’t take that long in itself, but runs for each solution candidate.

@Eiji Thank you, I’m not very good at recursions so it will take me a while to understand how this function works Also I might not said it in the topic, but all possible solutions are needed, so the skipping over part is an bonus functionality

@al2o3cr Thanks a lot, I will try to look into streams a bit more and maybe I can make this work with them.

dimitarvp · October 15, 2022, 12:29am

Great, then you can stream them and not read all of them at once. That way you’d be okay with system resources load.

heves · October 15, 2022, 10:38am

I guess generally your approach could make a significant difference, however in my case, the “bunch of files” means around 20 files, each <500 characters, so I wouldn’t expect any sort of noticable performance problem stemming from reading them all at once. Sure it could be optimized, but for now I think based on the replies in this topic the main problem is how I handle the large number of candidate solutions. (Before posting this topic I did a bit of testing, and the program ate up all 16 GBs of my PC’s memory, while trying to solve a mid-sized puzzle… )

dimitarvp · October 15, 2022, 10:51am

I see. In that case try one of the proposed solutions?

heves · October 16, 2022, 2:48pm

It took me a while to realize this is the right solution. After thinking about how to cut down more on the number of candidate solutions, I recognized your code basically does what I need to do. So thanks a lot, today I learned something about recursions