Most performant (and lazy) way to do an operation on 2 lists

Hello!

I am building a quite robust pipeline of operations that performs basically math of all types on thousands (in the future may be millions) of rows.

I would like to know what’s the most performant way to do something that’s been repeating on my code quite a lot, because I always have 2 different lists of variables and need to transform to one by using different types of formulas, so it can become a bottleneck, so a simple example of one case would look like this:

Stream.zip(axs, ays)
|> Stream.map(fn {ax, ay} -> Math.sqrt(ax + ay) end)

Where axs and ays would be lists of float numbers.

By the example you can already understand that it has to be lazy, because we plan to optimize the pipeline using Flow or some other concurrent tool when we have a first version, so advices in that direction are also welcome. :slight_smile:

Thanks in advance!

You are already doing it right IMO.

The only question that’s left is: do you really want to use Elixir for math? Definitely not its forte.

Alternatively, I’d just dump the two collections of numbers in a SQLite database and do the math operation with a SELECT statement. But it depends. If the two lists of data are huge and/or not guaranteed to zip together well then Elixir might be the better tool.

1 Like

It depends on how complex your map function is. You mentioned possibly concurrent use later. If you have a complex function then I would use Task.async_stream which would allow you to run concurrent operations instead of one at a time.

The type of math mentioned in the original post seems perfectly fine for Elixir. The implementation of most functions on the math module are NIFs. The stuff that tends to be out of scope for Elixir are heavy math operations like calculating the dot product of two large matrices.

1 Like

I wouldn’t change from operating in one function to a Task based on the code complexity, instead that should be done based on the runtime behavior, for example if a function is parallelizable then it would make sense to use concurrency like a Task (or perhaps GenStage/Flow)

2 Likes

Well the poster mentioned as a simple example of one case, so I don’t know what these other cases are or how complex they may be, thats why I stated depends on how complex. I agree that in this simple example it wouldn’t make much sense as the overhead out weighs the benefit, but the larger picture may warrant its use.

Most performant way to do math would probably be using Matrex and operating on large bulk amounts of the numbers in whatever large amounts you can get at the time (this will fit in Flow very well).

1 Like