I know about Elixir Nx. That’s not what I’m asking about,.
For many problems, “dumb algorithm + huge dataset” beats “clever algorithm + tiny dataset”.
In the “dumb algorithm + huge dataset” model, we can informally divide into “data preparation” and “training”.
For “training”, Elixir Nx can be very useful.
I’m more curious about “data preparation” – collecting the dataset, figuring out how to storage the dataset (which distrubted db to use ---- huge dataset often implies more than one machine), cleaning the dataset (data can be noisy), figuring out how to serve the dataset with low latency + high throughput (for training).
For those who have used Elixir for the “data preparation” part of “dumb algorithm + huge dataset” type problems, can you share insights on experiences / workflows / libraries used ?