Suitability of Elixir to math-oriented apps

I took a brief look at Julia, and from what I could tell it did not support or leverage multicore computers. How can it be “high performance” if it only runs on a single core? Or did I miss something?

Plenty of algorithms lack straight forward parallel implementations. State of the art for quite a lot of algorithms is a single threaded C or C++ implementation.

This isn’t even necessarily a problem, as in production it’s often easy to just run multiple jobs in parallel.

Yes, you did. Julia does not do concurrency (which is performing multiple separate tasks at the same time), but it does allow parallellism (which is performing the same or similar tasks, such as the same calculation on multiple pieces of data at the same time).

Elixir is great at the former, which makes it work consistently even when under high loads. Julia (and some other environments) have the latter, which allow it to have ‘high performance’ when doing number crunching.

I believe using the @parallel-keyword in combination with a loop is enough to let it divide work across your processors. Short explanation can be found here.

3 Likes

I heard that Elixir is not good at number crunching, but I didn’t expect it to be this bad! I tried to solve Problem #14 on Project Euler using both Ruby and Elixir. Ruby gave me the correct answer in 1.5s on my Macbook, while Elixir just letted me wait and wait and wait until I lost my patience and pressed Ctrl + C.

Here is the Ruby code:

class Collatz
  def initialize               
    @cache = {}             
  end

  def longest(upperbound)      
    1.upto(upperbound).map{|n| [n, sequence_length(n)]}.max_by{|_, len| len}
  end
  
  def sequence_length(n)       
    return 1 if n == 1         
    len = @cache[n]            
    if len
      len
    elsif n.odd?
      @cache[n] = sequence_length(3 * n + 1) + 1
    else
      @cache[n] = sequence_length(n / 2) + 1
    end
  end
end

require 'benchmark'

collatz = Collatz.new
puts Benchmark.measure {
  p collatz.longest(1_000_000)
}

And here is the Elixir code, which is not optimized to tail recursion (I’m still struggling how to do tail recursion with cache):

defmodule Collatz do           
                               
  use Agent
  require Integer

  def start_link do
    Agent.start_link(fn -> %{} end, name: __MODULE__)
  end
  
  def longest(upperbound) do   
    Stream.interval(1)         
    |> Stream.map(fn n -> {n + 1, sequence_length(n + 1)} end) 
    |> Stream.take(upperbound) 
    |> Enum.max_by(fn {_, len} -> len end)
  end

  def sequence_length(1), do: 1

  def sequence_length(n) do
    len = Agent.get(__MODULE__, &Map.get(&1, n))
    
    cond do
      len != nil -> 
        len
      Integer.is_odd(n) ->
        new_len = sequence_length(3 * n + 1) + 1
        Agent.update(__MODULE__, &Map.put(&1, n, new_len))
        new_len
      true ->
        new_len = sequence_length(div(n, 2)) + 1
        Agent.update(__MODULE__, &Map.put(&1, n, new_len))
        new_len
    end
  end
end

Collatz.start_link()

:timer.tc(fn ->
  Collatz.longest(1_000_000)
end)
1 Like

The Elixir code you’ve got isn’t a particularly suitable adaptation, which isn’t particularly that uncommon when folks are getting going. By way of comparison https://github.com/sorentwo/euler/blob/master/lib/euler_014.exs runs in ~4.7 seconds on my computer and it doesn’t even have any caching.

5 Likes

That’s interesting. After I removed caching, the performance gets much better (~3.4s). Then after I introduced tail recursion, the performance gets a little better (~3.2s), so non-tail recursion isn’t the cause of low performance. When I did the same thing in Ruby code (well, Ruby doesn’t optimize tail call so I didn’t do tail recursion in Ruby), the performance gets worse (~8.5s). So the performance of Elixir isn’t that bad. My fault. It seems that the worst part of my former Elixir code is the use of Stream (but why?).

But again, I’m still wondering how to do tail recursion with cache. I counted how many steps are saved with caching in Ruby, and it saved 130,265,812 steps, only 2,168,610 steps are actually calculated. When I added caching back to Elixir code, though this time it didn’t make me wait forever, the performance still got worse (~13s), but how can this be if it saved so many steps?

Instead of using an agent for cache, can’t you pass a map around?
*Edit, I’m on my phone right now, but if you don’t know what I mean I can show you

1 Like

I understand what you say. I just need some time to arrange the code if I need to passing and returning a map across all function calls. This is gonna be error-prone.

UPDATE

OK, I managed to do as what you said, but the performance (~3.7s) is still worse than no cache.

Here’s the new Elixir code:

defmodule Collatz do           
                               
  require Integer              

  def longest(upperbound) do   
    (1 .. upperbound)          
    |> Enum.reduce({0, 0, %{}}, fn(n, {m, max_len, cache}) ->
      case sequence_length(n, cache) do
        {len, new_cache} when len > max_len ->
          {n, len, new_cache}  
        {_len, new_cache} ->   
          {m, max_len, new_cache}         
      end
    end)
    |> elem(0)
  end

  def sequence_length(1, cache), do: {1, cache}

  def sequence_length(n, cache) do
    len = Map.get(cache, n)
    {new_len, new_cache} = cond do  
      len ->
        {len, cache}
      Integer.is_even(n) ->
        sequence_length(div(n, 2), cache)
      true ->
        sequence_length(3 * n + 1, cache)
    end
    {new_len + 1, Map.put(new_cache, n, new_len + 1)}
  end

end

Try using an ets table for caching.

1 Like

I’m skeptical that any data store external to the process will perform well. ets is great when you need a cache shared between processes, but within a single process a map or the process dictionary should be faster.

1 Like

Update: the ets code is using a bulk insert (a list of tuples) - giving it the apparent edge over the map code. Replacing individual Map.put/3 calls with a consolidated maps:merge/2 call improved matters for the map as a cache case.

For the process dictionary approach see:
Improving a slow solution for Longest Collatz Sequence

1 Like

Excellent question. When you use stream, you are typically trading CPU for memory. I.e. it is more computationally expensive but it allows you to work with large or even infinite datasets.

10 Likes