Installing NX on Jetson Nano

vgrechin · April 9, 2022, 9:15pm

I managed to compile XLA on Jetson Nano platform with Cuda 10.2 of JetPack
And I’m compiling XLA 0.2 because latest version (0.3) is abandoned the version of Cuda.
That’s how I manage deps of the project

  defp deps do
    [
      {:xla, "~> 0.2.0", runtime: false, app: false, override: true},
      {:exla, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "exla", tag: "v0.1.0", app: false},
      {:nx, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "nx", tag: "v0.1.0", override: true, app: false},
      {:axon, "~> 0.1.0-dev", github: "elixir-nx/axon", app: false},
      {:elixir_make, "~> 0.6", app: false, override: true},
      {:table_rex, "~> 3.1.1", app: false, override: true}
    ]
  end

My config is:

use Mix.Config

config :nx, :default_defn_options, [compiler: EXLA, client: :cuda]
config :exla, :clients, cuda: [platform: :cuda], default: [platform: :cuda]

I’m trying to run the code from LambdaDays’21 but it runs through MNIST training loop very slowly,
it seems CUDA and EXLA itself don’t work with such configuration.

I also tried
@default_defn_compiler EXLA
in livebook with the same unfortunate result.

How can I check if XLA, EXLA and NX were properly built?

josevalim · April 9, 2022, 9:28pm

Welcome!

Try running this example: nx/mnist.exs at main · elixir-nx/nx · GitHub

You will also need to set the XLA_TARGET environment variable accordingly: GitHub - elixir-nx/xla: Pre-compiled XLA extension

vgrechin · April 9, 2022, 9:33pm

I have got these lines in my .profile since first compilation and it produced the extention archive with the name
xla_extension-aarch64-linux-cuda.tar.gz

export EXLA_TARGET=cuda
export EXLA_BUILD=true
export XLA_TARGET=cuda
export XLA_BUILD=true

This part is looking correct to me so I’m checking nx/mnist.exs first

vgrechin · April 9, 2022, 9:43pm

I only added my deps management of Mix in the beginning of the script and get such error line:

** (UndefinedFunctionError) function EXLA.set_as_nx_default/1 is undefined or private
(exla 0.1.0-dev) EXLA.set_as_nx_default([:tpu, :cuda, :rocm, :host])

josevalim · April 9, 2022, 9:44pm

Make sure you are on latest EXLA.

vgrechin · April 9, 2022, 9:49pm

I’ll double check that EXLA is of latest stable release of NX tagged by v0.1.0
Later version is seem incompatible with Jetson Nano because of Cuda 10.2

vgrechin · April 9, 2022, 10:06pm

Working copy of EXLA here
~/.cache/mix/installs/elixir-1.13.3-erts-12.2.1/d9a9cec7685bde41b52309e196ec7f75/deps/exla/exla

Is tagged by the value
23e3ca8 (HEAD, tag: v0.1.0) Release v0.1.0

Probably, this is not bleeding edge, but was released in Jan’22 - I have to stick to it because of CUDA in JetPack which is impossible to promote.

The script works without the line, but very slowly

#EXLA.set_as_nx_default([:tpu, :cuda, :rocm, :host])

vgrechin · April 10, 2022, 10:55am

I finally managed to run mnist.exs with no change, but encountered an issue

gpu/asm_compiler.cc:77] Couldn't get ptxas version string: Internal: Running ptxas --version returned -1

although ptxas is in my PATH and returns the version, any ideas why is that? I’m attaching full execution log

ptxas --version
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:13:18_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

vgrechin · July 24, 2023, 5:52pm

Unfortunately, I couldn’t solve the issue other way then upgrading to next generation Tegra platform of Nvidia, Jetson Orin Nano 8G. Which now probably cheapest way to run Elixir NX, EXLA fully capable LiveBook and other Elixir AI goodies to me.

maz · November 3, 2023, 12:26am

Do you have a public github repo of this?

vgrechin · November 3, 2023, 7:21am

What kind of repo this would be? Docker or what?

madsbuch · March 9, 2024, 11:41am

This is incredible! I would also (like @maz) love to see a write up / some repo resources on how you got it to work.

I am looking to build a small home lab for Livebook development, and it seems like an extendible cluster of Jetson’s might be the cheapest way to go.

vgrechin · March 9, 2024, 12:31pm

@madsbuch, if seek AI platform for your lab, a bigger box like Orin AGX (64GB) would play much better the role.
Plus I’d wait JetPack 6.0 (Mar’23) to give it a try behind NX infrastructure

madsbuch · March 10, 2024, 10:36am

I wold love that set, but in Denmark it is ~3000USD.

I don’t need a particularly beefy system, as I expect I would rent servers online if I need to ramp up. This is just to have a place to fail.

Regardless, it seems like an eGPU with a cheat RTX 3060 / 12GB is probably the way to go.

vgrechin · March 10, 2024, 12:25pm

Right, Orin AGX is damn expensive anywhere not only in Danmark )
Though these Orins are rather embedded class devices, hence at some point you would love to play something at your hands not only in cloud.
May be Orin Xavier (16G) is then reasonable compromise, to run many models from Hugging face and Elixir.
Plus 3060 RTX to explore AI achievements of Chat with RTX