and thank you for the excellent libraries like Nx, Scholar, Axon…
We are in the process of transitioning some of our work from Python to Elixir. It’s heartening to see that many of our tasks can be accomplished in Elixir, thanks to Nx and the surrounding ecosystem. However, we have not yet found a solution for handling partial/batches to conserve memory when performing tasks like PCA.
Given our dataset, we would need to construct a tensor of s64[50000][10000] as input, which poses challenges for several reasons.
In Python, we might use something like IncrementalPCA from scikit-learn to handle this. However, we’re unsure if we’ve overlooked a way to achieve this with Nx/Scholar.
We would appreciate any guidance or directions on this matter.
Thank you.
hey
I’m far from being an expert and I’m not familiar with the PCA, so please take what I’m going to write with a pinch of salt.
Have you looked at Nx.serving?
As far as I know Nx.serving can be used to batch requests and computations, quoting the docs
More specifically, servings are a mechanism to apply a computation on a Nx.Batch, with hooks for preprocessing input from and postprocessing output for the client. Thus we can think of an instance of Nx.Serving.t/0 (a serving) as something that encapsulates batches of Nx computations.
The Nx library comes with its own tensor serving abstraction, called Nx.Serving, allowing developers to serve both neural networks and traditional machine learning models within a few lines of code.
Hi @bultas and sorry for the delayed action. Scholar.Decomposition.PCA now has incremental_fit/2 function which can be used to fit a model on a stream of batches.
I would suggest opening an issue on GitHub in the future whenever you have questions like this. We will be notified and will be able to react more quickly.