Performance with parallel map and math operations

I have a weird performance question.
I have one list of approx 10k elements, and I’m obtaining a matrix (just nested lists) as a result of multiplying each element by all the elements of a duplicate list, just like column vector * row vector. I’m doing this using parallel map and it takes approx 15 secs to do it. If I split this list in two and only calculate one half, the duration is almost half, which makes sense, it’s approx 8 secs. So I thought, ok, I can also parallelize this and run both halves in different processes, which, in turn, will use tasks to make the calculations.
To make it easier, I have a genserver, I created 2 instances of gen server, and passed one half to each instance. But the end result is almost the double of time! about 27 secs. How is that possible?

How are you handling the resultant data in the parallel case?

This is what I have in my genserver

def handle_cast({:list, [row, column]}, _) do

multiply = fn (e) ->, Stream.cycle([e])) |> {x, y} -> x * y end) end

res = column
  |> -> multiply.(&1) end))

{:noreply, res}


And I generate the genservers like this

{:ok, pid} =GenServer.start_link
{:ok, pid1} = GenServer.start_link

GenServer.calculate(pid, [row, col1])
GenServer.calculate(pid1, [row, col2])

list1 = GenServer.get_list(pid)
list2 = GenServer.get_list(pid1)

And at the end is when I get the 2 lists, although only to obtain the 2 lists, it takes almost the double as if I do everything at once

Have you added functions to the GenServer module? This you should not do but define them in a separate callback module.

When you create a GenServer you are starting another process and when you send data to/from it, in the case in the handle_cast, you are copying the data which is not free if there is a lot of data. How are you running it in the sequential case?

In that case maybe using genserver is not the best idea? as even if I put the functions in another module, I’ll need to store the result in the genserver state, so if the result is a large dataset, I’ll have to pass this result dataset to the genserver to store in the state and then pass it back when I need it. is that right or am I confusing terms?

The other approach which is not using the genserver, and not partitioning the list is like this:

 multiply = fn (e) ->, Stream.cycle([e])) |> {x, y} -> x * y end) end

  |> -> multiply.(&1) end))

So basically the same, the difference is that in this case length(col) == n and I decided to send 2 lists to 2 genservers where length(col_1) == n/2 thinking that 2 parallel processes (genservers) were going to process half of the list each in the double of time.

That sounds correct. Concurrency in elixir does work by having multiple concurrent processes and to share data between processes it does need to be copied (at least most of the time). So doing things in parallel is not bound to improve performance if the cost of copying stuff is higher than just doing the work on the same process.

The cases in which we do not copy are very sparse. It is save to assume 100% copying when sending messages across processes. This is even more true when sending across nodes.

gotcha, looks like genserver is not the best approach then.

What is the performance without using any additional processes? So with no Task.async or Task.await?

Also do you have example data that I could play with if I wanted to test approaches myself?