Hi Community,
I am dealing with a task where I want to handle a huge dataset utilizing Elixir. The dataset comprises of millions of records and the processing involves several complex transformations and calculations. Currently, the processing is quite slow, and I am looking for advice on how to optimize it.
Here are a few particulars:
- Each record undergoes multiple transformations.
- There are several CPU-intensive calculations.
- I’m using GenStage to handle the data flow.
I’ve tried using Task.async_stream to parallelize some of the work, but the improvement is marginal. I am wondering if there’s a better approach to parallel processing in Elixir or any specific libraries that could help with this type of workload. Any tips on optimizing performance or managing large data efficiently in Elixir would be greatly appreciated.
I also read this thread (https://elixirforum.com/t/massive-distributed-parallel-processing-of-large-data-sets-cissp-with-elixir/49584) on elixirforum but couldn’t get the solution of my query.
Much thanks to you.
Steve