Using tasks to speed up an algorithm

Hi! I’m exploring how to using processes and tasks to optimize code, and wanted to ask the how you’d go about this for the Elixir version of Google’s PageRank below.

I tried modifying the rank(pages, i) function to use Task.async and Task.async_stream, but both versions took longer to finish than this one (using elixir’s time functionality).

All help and ideas are appreciated!

pages = [
  %{id: 1, links: [2,3], rank: 0.25},
  %{id: 2, links: [3], rank: 0.25},
  %{id: 3, links: [1], rank: 0.25},
  %{id: 4, links: [3], rank: 0.25}
]

# Iterate over pages 60 times
Pagerank.rank(pages, 60)

defmodule Pagerank do
  # If iterated 60 times, inspect
  def rank(pages, 0), do: IO.inspect(pages)

  # Iterate through pages, creating a new list with updated rank each time
  def rank(pages, i) do
    pages
    |> Enum.map(&(calculate_pagerank(&1, pages)))
    |> rank(i - 1)
  end

  # Calculate a page's rank based on pages that link to it, and update
  def calculate_pagerank(page, pages) do
    d= 0.85
    sum =
      pages
      |> Enum.filter(&(Enum.member?(&1.links, page.id)))
      |> Enum.reduce(0, &((&1.rank/length(&1.links)) + &2))

    Map.put(page, :rank, (1-d) + (d * sum))
  end
end

Using a single task there wouldn’t gain any speed, it would still be serialized. You’d want to spawn a number of tasks equal to the CPU core count and distribute the work among them, but depending on the work being done that may or may not gain anything either, it’s all highly dependent on the type and how much work is being done. For the work being done in the example code then you’d need a pretty large input set to overcome the inherent initial parallelism cost and you’d need to distribute the work properly among multiple tasks.

2 Likes

I see!

So would this be correct, assuming a large enough dataset:

def rank(pages, i) do
  pages
  |> Enum.map(fn page -> Task.async(fn -> calculate_pagerank(page, pages) end) end)
  |> Enum.map(fn task -> Task.await(task) end)
  |> rank(i - 1)
end

Assuming pages is a list of lists/maps/whatever sure, but I’d still find it easier to Enum.split it over the number of cores. :slight_smile:

System.schedulers_online/0

Once the task start up time becomes relatively minimal - Task.async_stream/5

4 Likes