So I interpret it as, most issues, I mentioned, are considered done?
You can do a version of distributed computation with FLAME + Explorer, sure. But is it done? Certainly not.
For instance, I doubt there exists a theoretical model for how data is processed in that setup (Though if I’m wrong, someone call me out!). Creating one is probably worthwhile. Then we could answer: what are the current limits of our processing capabilities? Can we be processing more? Could we introduce configuration settings that make certain workloads easier/faster/more efficient?
And that’s just off the top of my head. There are tons of open questions even in that one project.
The other thing, it’s all multi-machine. But for example, having, multiprocessing is not mentioned.
It’s funny you mention that. FLAME + Explorer is just one example from this area. Let me send you to this LiveBook/Nx announcement from a year ago:
That link has this video at the top, and I’ll send you to a specific timestamp:
José says:
But it gets even better. Because, what we can do now is that we can also make this distributed. And “distributed” is a funny word because depending on who you ask it has different meanings. So if you asked me two years ago I would say well distributed is when you have multiple machines communicating with each other. But if you ask a machine learning engineer they may say well distributed is when you have more than one GPU in your machine and you’re using those GPUs and sometimes even communicating across those GPUs. And in order to avoid confusion, Nx can do both. We can do both kinds of distributions. And that’s what I want to show you…
So in Nx also there is some pre-existing work. But again, I’d never describe the work as “done”.
So, I assume [that it can use multicore is] also true for Explorer, although I have not found it in the doc.
Thanks, we should definitely be documenting that!
Perhaps the GPU backend for Explorer. It is probably more work. Just how scientific work that is?
A new backend for Explorer would be a massive amount of work. And yeah I’m not sure it’s really appropriate for a Master’s thesis?
When I see, that here that I can use it with fly.io, or with k8, I am wondering what it costs to train a DL model or how easy it is, to set-up a local k8 cluster, compared to a spark cluster.
I personally know less about this. I don’t use DL much in my own work. Tutorials for this kind of thing would be incredibly valuable to the community, and I’m not sure if any exist.