Installation advice for system with many cores and GPUs

typesend · August 18, 2022, 10:16am

I recently ordered a Lambda Vector Workstation for machine/deep learning experiments as well as blender-related video rendering.

Four RTX A4000 GPUs (without NV Link)
AMD Threadripper 3960X: 24 cores, 3.80 GHz, 128 MB cache
256 GB RAM
ASRock TRX40 Creator motherboard
Ubuntu preinstalled

Deeply appreciate any advice you can share that could help me make the most of this hardware using Elixir and avoid any gotchas or suboptimal performance!

Should I build the BEAM from source? Any special configuration details or build flags I should pay special attention? I’ve always had difficulting getting jinterface and wxWidgets set up in the past but I should probably get those right this time.

garazdawi · August 18, 2022, 10:40am

If you want the best performance then you want to pass the -march=native flag to gcc, this means building from source. I would also throw in the --enable-jit flag to make sure that you get the jit (if you don’t give the flag, then configure will silently select the non-jit if the correct tools to build the jit are not available).

./configure --enable-jit CFLAGS="-O2 -g -march=native" && make && make install

bdarla · August 18, 2022, 10:52am

(Side question)
Is your plan to use the Nx framework for ML/DL or to use Elixir for orchestrating execution in other languages/frameworks e.g. in Python?

typesend · August 20, 2022, 8:12am

Good question. Both!

I’m new to this and the vast majority of learning materials are Python-based, but of course I also want to have a runtime environment that smoothly takes advantage of all cores and is more resilient.

The BEAM will be quite helpful in scheduling/queuing jobs and keeping track of benchmark stats.

Baby steps—but someday I hope to be able to help move Elixir-based ML tooling forward. There’s no good reason Python should have a complete monopoly over the ML world, especially when you consider the Python code isn’t really doing the heavy lifting.