Installing NX on Jetson Nano

I managed to compile XLA on Jetson Nano platform with Cuda 10.2 of JetPack
And I’m compiling XLA 0.2 because latest version (0.3) is abandoned the version of Cuda.
That’s how I manage deps of the project

  defp deps do
    [
      {:xla, "~> 0.2.0", runtime: false, app: false, override: true},
      {:exla, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "exla", tag: "v0.1.0", app: false},
      {:nx, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "nx", tag: "v0.1.0", override: true, app: false},
      {:axon, "~> 0.1.0-dev", github: "elixir-nx/axon", app: false},
      {:elixir_make, "~> 0.6", app: false, override: true},
      {:table_rex, "~> 3.1.1", app: false, override: true}
    ]
  end

My config is:

use Mix.Config

config :nx, :default_defn_options, [compiler: EXLA, client: :cuda]
config :exla, :clients, cuda: [platform: :cuda], default: [platform: :cuda]

I’m trying to run the code from LambdaDays’21 but it runs through MNIST training loop very slowly,
it seems CUDA and EXLA itself don’t work with such configuration.

I also tried
@default_defn_compiler EXLA
in livebook with the same unfortunate result.

How can I check if XLA, EXLA and NX were properly built?

Welcome!

Try running this example: nx/mnist.exs at main · elixir-nx/nx · GitHub

You will also need to set the XLA_TARGET environment variable accordingly: GitHub - elixir-nx/xla: Pre-compiled XLA extension

I have got these lines in my .profile since first compilation and it produced the extention archive with the name
xla_extension-aarch64-linux-cuda.tar.gz

export EXLA_TARGET=cuda
export EXLA_BUILD=true
export XLA_TARGET=cuda
export XLA_BUILD=true

This part is looking correct to me so I’m checking nx/mnist.exs first

I only added my deps management of Mix in the beginning of the script and get such error line:

** (UndefinedFunctionError) function EXLA.set_as_nx_default/1 is undefined or private
(exla 0.1.0-dev) EXLA.set_as_nx_default([:tpu, :cuda, :rocm, :host])

Make sure you are on latest EXLA. :slight_smile:

I’ll double check that EXLA is of latest stable release of NX tagged by v0.1.0
Later version is seem incompatible with Jetson Nano because of Cuda 10.2

Working copy of EXLA here
~/.cache/mix/installs/elixir-1.13.3-erts-12.2.1/d9a9cec7685bde41b52309e196ec7f75/deps/exla/exla

Is tagged by the value
23e3ca8 (HEAD, tag: v0.1.0) Release v0.1.0

Probably, this is not bleeding edge, but was released in Jan’22 - I have to stick to it because of CUDA in JetPack which is impossible to promote.

The script works without the line, but very slowly

#EXLA.set_as_nx_default([:tpu, :cuda, :rocm, :host])

I finally managed to run mnist.exs with no change, but encountered an issue

gpu/asm_compiler.cc:77] Couldn't get ptxas version string: Internal: Running ptxas --version returned -1

although ptxas is in my PATH and returns the version, any ideas why is that? I’m attaching full execution log

ptxas --version
ptxas: NVIDIA (R) Ptx optimizing assembler
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_21:13:18_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89