How could I speed up the code

anuaralfetahe · December 14, 2019, 11:39pm

I have a code for generating a list of possible combinations. It is working fine with less than 11 digits but after 11 it gets very slow sometimes even freezes my computer. I would like to generate a very large collection of combinations. I am new to elixir.
Heres the code

  def calculate_permutations(data) do
    permute(String.codepoints(data), "")
  end

  def permute(list, locked_el) when length(list) > 2 do
    Enum.flat_map(list,
      &(permute(List.delete(list, &1), locked_el <> &1))
    )
  end

  def permute(list, locked_el) do
    [
      locked_el <> Enum.at(list, 0) <> Enum.at(list, 1),
      locked_el <> Enum.at(list, 1) <> Enum.at(list, 0)
    ]
  end

ericgray · December 15, 2019, 8:15am

When working with large datasets the Stream module is faster. Looking at your code you’re using Enum.flat_map which is a greedy operation meaning that it immediately returns the result. You’re also calling List.delete on every item being passed to the function.

So if you have a list containing 11 digits List.delete is called 11 times for every single element. The more elements in a list the longer it takes to enumerate. That’s most likely your bottleneck. You probably need to learn more about recursion and pattern matching.

I cleaned up your code a bit using Stream and pattern matching. Stream is a lazy operation and will return the result at the end.

  def calculate_permutations(data) do
    data
    |> String.codepoints()
    |> permute("")
  end

  def permute(list, locked_el) when length(list) > 2 do
    Stream.flat_map(list, &permute(List.delete(list, &1), locked_el <> &1))
    |> Enum.to_list()
  end

  def permute(list, locked_el) do
    [first, last] = list
    p1 = "#{locked_el}#{first}#{last}"
    p2 = "#{locked_el}#{last}#{first}"
    [p1, p2]
  end

benwilson512 · December 15, 2019, 2:34pm

This is sort of true. Where Stream tends to win is on memory, if you can avoid loading the entire dataset into memory. It can also win on speed if you have several composed calls data |> map(a) |> map(b) |> map(c) and those are Stream.map calls then it’ll do it in a single pass instead of 3. BUT for a single pass, eager iteration is faster, because streams impose some overhead to allow them to gain the other benefits. Thus if you turn the 3 map operations into data |> map(& &1 a |> b |> c) eager is probably faster.

@anuaralfetahe The logic is slow here for a few reasons. The main reason it’s slow is this: Computing permutations is intrinsically slow. Some algorithms, like counting how many items are in a list, get a little bit slower each time you make the list longer. Some algorithms though, like computing permutations, get a LOT slower as you add items. There is a “mathy” or formal way of looking at this kind of “slowness relative to inputs” called Big O notation, and the big O for permutations is pretty bad, the time it takes grows pretty fast.

The second reason it’s slow though does have to do with how you wrote it. I’d check out Most elegant way to generate all permutations? for an alternative implementation.

al2o3cr · December 15, 2019, 3:22pm

Permuting 11 objects will generate 11! = 39,916,800 results

Permuting 12 objects will try to generate 12 times more. I’m not surprised that flat_map gets bogged down; appending to a list of 39,961,800 elements is going to be exceptionally slow.

Also note that lists are built of cons cells, so each element takes at least as much space as two pointers - so 16 bytes on a 64-bit machine. As a result, that list of all 12-object permutations will need at least 7.6GB of memory. (!)

+1 to the recommendations to look at Stream to avoid generating the entire list, but I’d also recommend you consider an alternative algorithm for whatever’s consuming the list of permutations, as even streaming isn’t going to save you if you need to permute 20 elements (list would be 2.4e18 elements, so 60 BILLION times longer than the one for 11 elements).

anuaralfetahe · December 15, 2019, 5:21pm

Thank you for the answers.
I also thought it could the size of the list which is slowing down the computing.
Maybe if i use processes i could speed up the running.

anuaralfetahe · December 16, 2019, 12:31pm

Using Stream module and then convert to the Enum list at the end will actually slow down the whole process.

Going to rewrite the whole code for serving the output piece by piece for large datasets. Using recursions and pattern matching.

al2o3cr · December 17, 2019, 2:35am

(my emphasis) The intent of using Stream is to avoid ever holding the whole list in memory, as that’s the most expensive part. For instance, here’s a naive implementation of permutations with streams:

defmodule StreamPerms do
  def permutations_of([]), do: [[]]
  def permutations_of(ary) do
    Stream.flat_map(ary, fn el ->
      rest = ary -- [el]
      permutations_of(rest)
      |> Stream.map(&[el | &1])
    end)
  end
end

# run with
StreamPerms.permutations_of(Enum.to_list(1..11)) |> Stream.into(File.stream!("scratch_output", [{:delayed_write, 1000000, 2000}])) |> Stream.run()

This generates the 439MB output file in a handful of minutes locally. The VM uses less than 50MB of memory total the whole time.

Changing 11 to 19 would, if you had the time and the space, produce a file roughly the size of ALL data storage in the world circa 2011 (295EB).

HOWEVER

I still don’t understand what your ultimate goal is - permutations are easy to compute on demand, there are just a LOT of them. Storing them on disk or transmitting them over a network is going to take almost as long as (and cost almost as much as) generating them directly at the point of use. “Check all permutations” is one of the worst-case scenarios for time complexity, seen for instance in brute-force solutions to the traveling salesman problem. You likely want a better algorithm for whatever is consuming these permutations.