Iterate over large collection of numbers

Hi, I need to iterate over a list of about 1000000 numbers and eliminate/or mark as false all multiple of 2. It could be a list, a map, whatever. I’ve been struggling a lot to make it run in less than a minute but can’t do it. The only way I could do it was creating a map with key as the number and value a boolean, and then using recursion to jump from 2 on 2 in the keys and set the values to false. The problem with this is that at the end the results are unordered. I decided to use a map as I can have direct access to the element through the key and I don’t have to traverse a whole list. Tried with streams and lists and the result is even slower. Is there a standard and performant way to iterate over a large list if I know the position of the elements I want to change?

Can you paste the code that you are currently using?

I get this on the simple definition of your description:

iex> defmodule Testering do
...>   def testing(numbers) do
...>     Enum.filter(numbers, &(rem(&1, 2) > 0))
...>   end
...> end
iex> :timer.tc(Testering, :testing, [1..1000000])
{80124,
 [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41,
  43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79,
  81, 83, 85, 87, 89, 91, 93, 95, 97, ...]}

So it took 80124 microseconds here, I.E 80.124ms, I.E 0.080124 seconds.

If you have a more representative example then I could try with your data set then?

EDIT:
And here is a version of either changing to a false or doing nothing:

iex> defmodule Testering do
...>   def testing(numbers) do
...>     Enum.map(numbers, fn
...>       i when rem(i, 2) === 0 -> false
...>       i -> i
...>     end)
...>   end
...> end
iex>
nil
iex> :timer.tc(Testering, :testing, [1..1000000])
{132432,
 [1, false, 3, false, 5, false, 7, false, 9, false, 11, false, 13, false, 15,
  false, 17, false, 19, false, 21, false, 23, false, 25, false, 27, false, 29,
  false, 31, false, 33, false, 35, false, 37, false, 39, false, 41, false, 43,
  false, 45, false, 47, false, 49, ...]}

nice, that’s much better, tried and it’s fast, the reason I didn’t try that is that I need to extract all multiples of 2, except 2, then multiple of 3, except 3, and so on. When I add an or clause to Filter it takes ages :frowning:

You are still not showing code. :wink:
What you just described sounds very O(N^N), which is amazingly complex no matter the language, need to see your code to confirm. :slight_smile:

EDIT:

Also:

defmodule Testering do
  def testing(numbers) do
    Enum.filter(numbers, &(rem(&1, 2) > 0 or &1===2))
  end
end

:timer.tc(Testering, :testing, [1..1000000])

Gives 0.235s, which is not much slower?

It’s fine for 2, but when I try to apply the same function but with 3 to the result
Enum.filter(result, &(rem(&1, 3) > 0 or &1===3))
is when it takes ages

I’m trying to implement something like this:

In javascript or java is quite fast, trying to get similar results here. I guess it can be done, I’m just not very familiar with the functions available in Elixir/Erlang.
I don’t want the algorithm, just would like to make sure I know the functions that would be appropriate to get this done.

defmodule Testering do
  def testing(numbers, d) do
    Enum.filter(numbers, &(rem(&1, d) > 0 or &1===d))
  end
end
iex> Enum.map(1..10, &(:timer.tc(Testering, :testing, [1..1000000, &1])|>elem(0)))
[99872, 125716, 116785, 185102, 180786, 162543, 163336, 121551, 113907, 134438]

I’m not seeing it get any slower?

You are still not showing your complete reproducible example so we can optimize it. :wink:

Sounds like the drive of erasthostenes… There are a lot of implementations in a lot of languages.

I believe I’ve on somewhere in my drawers… I’ll take a look when kids are in bed.

There are tons in elixir and erlang online, some even concurrent. The issue though is that it is number processing and not IO processing (which is what the BEAM excels at) so it will never be as fast as, say, Rust for the problem. I’m trying to demonstrate that but it is hard to without seeing the slow code in question. ^.^;

Comparing to rust when we talk about processing speed is kind of unfair, but what makes it really slow is excessive copying of the cells.

1 Like

Yep, a proper sieve would build up the list probably recursively (or via TCO if really long with a reverse at the end if that is wanted) instead of filtering it repeatedly. :slight_smile:

This is the main part of the algorithm, I don’t want to copy everything as I guess it will be too much

 def remove_multiples([head | tail], n) when head <= n do
    [head | remove_multiples(Enum.filter(tail, fn e -> rem(e, head) !== 0 end), n)]
 end

It works for small numbers but for 1000000 it just times out, maybe using Streams could be a good idea?

But that’s not erasthostenes any more then. Erasthostenes explicitely filters a given list from 2 to x iteratively.

Could just cheat and stream this file…

http://primes.utm.edu/lists/small/millions/

Which is efficient for languages with mutable variable, which BEAM languages are not. :wink:

When going from mutable to immutable languages you often need to refine the algorithm you are using for the new environment. :slight_smile:

Yeah that is because you are quite literally iterating over the list N number of times, so 1000000*log(N) or so times (give or take based on the number of primes up to a given value). That is a HUGE number of iterations. You need to reformulate the algorithm to think in the immutable world. :slight_smile:

Actually, after reading a bit about the algorithm, using the square root of N as limit in the guard clause made it. Now I get the results in 1 second for 1000000, while before it was timing out after 1 minute. Checked the results with a generator and I get the same :o

Awesome! :slight_smile:

And for comparison, the usual way (converted to elixir) is generally:

defmodule Testering do
  def primes_up_to(num) when is_integer(num) and num>0 do
    primes_up_to(2, round(:math.sqrt(num)), [], :lists.seq(3, num, 2))
  end
  def primes_up_to(prime, max, primes, ints) when prime > max do
    :lists.reverse([prime | primes], ints)
  end
  def primes_up_to(prime, max, primes, ints) do
    [newPrime | newInts] = for x <- ints, rem(x, prime) !== 0, do: x
    primes_up_to(newPrime, max, [prime | primes], newInts)
  end
end

Where I get ~1.5s for 1000000. :slight_smile:

Now, when you said this

were you speaking about the version you just posted? Is that version more functional like ?

A few years ago, I remember briefly playing with a lazy sieve (i.e. sieve as a stream). I was able to dig up my implementation here. As you can tell, it’s a bit outdated (using HashDict).

I updated it, and gave a quick try. It compute primes <= 1_000_000 in 1.7s on my machine. No idea whether that’s a reasonable speed :slight_smile: Here’s the updated version:

defmodule Primes do
  # Basic implementation of an infinite prime generator, as explained at
  # http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

  def new do
    %{composites: %{}, current: 2}
  end

  def next(%{composites: composites, current: current} = sieve) do
    case Map.get(composites, current) do
      nil ->
        sieve
        |> add_composite(current * current, current)
        |> advance

      factors ->
        Enum.reduce(factors, sieve, &add_composite(&2, current + &1, &1))
        |> remove_current
        |> advance
        |> next
    end
  end

  defp advance(%{current: current} = sieve) do
    %{sieve | current: current + 1}
  end

  defp remove_current(%{composites: composites, current: current} = sieve) do
    %{sieve | composites: Map.delete(composites, current)}
  end

  defp add_composite(%{composites: composites} = sieve, which, factor) do
    %{sieve | composites: Map.update(composites, which, [factor], &[factor | &1])}
  end

  def stream do
    Stream.unfold(new(), fn(acc) -> {acc.current, next(acc)} end)
  end
end

{time, count} =
  :timer.tc(fn ->
    Primes.stream()
    |> Stream.take_while(&(&1 <= 1_000_000))
    |> Enum.count()
  end)

IO.puts "#{count} primes counted in #{time / 1_000_000}s"

1 Like

Not really, it is still a pretty direct translation. The BEAM’s List handling is one of the fastest parts about it so you would not gain a whole lot. However the pattern matching heads is a great boon, unique to the VM, not really used to its full power here but is one of the parts of this speed.

Overall though, math crunching is not what the BEAM is designed for, thus is not fast at. The BEAM is designed for massively concurrent and reliable IO processing, which it utterly excels at. :slight_smile:

Big thanks to everyone! This was an interesting converstation

1 Like