How to execute Incremental PCA or similar batch tasks in Nx/Scholar?

Hello,

and thank you for the excellent libraries like Nx, Scholar, Axon…

We are in the process of transitioning some of our work from Python to Elixir. It’s heartening to see that many of our tasks can be accomplished in Elixir, thanks to Nx and the surrounding ecosystem. However, we have not yet found a solution for handling partial/batches to conserve memory when performing tasks like PCA.

PCA in Scholar

Given our dataset, we would need to construct a tensor of s64[50000][10000] as input, which poses challenges for several reasons.

In Python, we might use something like IncrementalPCA from scikit-learn to handle this. However, we’re unsure if we’ve overlooked a way to achieve this with Nx/Scholar.

We would appreciate any guidance or directions on this matter.
Thank you.

hey :wave:
I’m far from being an expert and I’m not familiar with the PCA, so please take what I’m going to write with a pinch of salt.

Have you looked at Nx.serving?
As far as I know Nx.serving can be used to batch requests and computations, quoting the docs

More specifically, servings are a mechanism to apply a computation on a Nx.Batch, with hooks for preprocessing input from and postprocessing output for the client. Thus we can think of an instance of Nx.Serving.t/0 (a serving) as something that encapsulates batches of Nx computations.

https://hexdocs.pm/nx/Nx.Serving.html

The Nx library comes with its own tensor serving abstraction, called Nx.Serving, allowing developers to serve both neural networks and traditional machine learning models within a few lines of code.

Hope it helps, let us know.
Cheers :v:

Thank you, I will take a closer look and let you know if I find a solution.

1 Like