I’ve been trying for the past week to get a running version of Livebook on a Jetson Orin AGX box (integrated GPUs) The box only has CUDA version 11.8 and 11.4 installed.
Whenever I try to run the EXLA tests to just ensure basic CUDA works, I get this error:
dave@CHCHE-ORIN-01:~/work/nx/exla$ mix test
Using libexla.so from /home/dave/.cache/xla/exla/elixir-1.17.2-erts-14.2.5-xla-0.5.1-exla-0.6.4-6c7e3kyqmrq4l2ogbwoouzxmw4/libexla.so
make: '/home/dave/work/nx/exla/_build/test/lib/exla/priv/libexla.so' is up to date.
08:37:06.137 [info] domain=elixir.xla file=xla/stream_executor/cuda/cuda_gpu_executor.cc line=880 could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
08:37:06.145 [info] domain=elixir.xla file=xla/service/service.cc line=168 XLA service 0xffff54002c80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
08:37:06.146 [info] domain=elixir.xla file=xla/service/service.cc line=176 StreamExecutor device (0): Orin, Compute Capability 8.7
08:37:06.146 [info] domain=elixir.xla file=xla/pjrt/gpu/se_gpu_pjrt_client.cc line=633 Using BFC allocator.
08:37:06.146 [info] domain=elixir.xla file=xla/pjrt/gpu/gpu_helpers.cc line=105 XLA backend allocating 25646193049 bytes on device 0 for BFCAllocator.
Running ExUnit with seed: 164975, max_cases: 16
Excluding tags: [:platform, :integration, :multi_device, :conditional_inside_map_reduce]
Including tags: [platform: :cuda]
2) test range randint (EXLA.NxRandomTest)
test/exla/random_test.exs:10
** (RuntimeError) Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.func.launch' failed: Failed to get stream's capture status: the provided PTX was compiled with an unsupported toolchain.; current tracing scope: fusion; current profiling annotation: XlaModule:#hlo_module=_Function_20.55097802_1_in_Nx.Random.___defn_key____.12,program_id=5#.
code: key = Nx.Random.key(127)
stacktrace:
(exla 0.6.4) lib/exla/executable.ex:56: EXLA.Executable.unwrap!/1
(exla 0.6.4) lib/exla/executable.ex:19: EXLA.Executable.run/3
(exla 0.6.4) lib/exla/defn.ex:346: EXLA.Defn.maybe_outfeed/7
(stdlib 5.2.3) timer.erl:270: :timer.tc/2
(exla 0.6.4) lib/exla/defn.ex:283: anonymous fn/7 in EXLA.Defn.__compile__/4
(nx 0.6.4) lib/nx/defn.ex:443: Nx.Defn.do_jit_apply/3
test/exla/random_test.exs:11: (test)
08:37:14.783 [warning] domain=elixir.xla file=xla/service/gpu/runtime/support.cc line=58 Intercepted XLA runtime error:
INTERNAL: Failed to get stream's capture status: the provided PTX was compiled with an unsupported toolchain.
08:37:14.783 [error] domain=elixir.xla file=xla/pjrt/pjrt_stream_executor_client.cc line=2614 Execution of replica 0 failed: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.func.launch' failed: Failed to get stream's capture status: the provided PTX was compiled with an unsupported toolchain.; current tracing scope: add; current profiling annotation: XlaModule:#hlo_module=test.5,program_id=0#.
I set XLA_BUILD=true so that it actually builds XLA first.
I managed to get XLA to build (with bazel) but this doesn’t seem to fix this “PTX” mismatch issue.
I’m hoping it is just some config env var or build env var that I’m setting wrong that is causing this.
for context here is my env vars I have set that are relevant:
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-11.8
export XLA_TARGET=cuda
export EXLA_TARGET=cuda
export XLA_BUILD=true
export TMP=/var/tmp
export TF_CUDA_VERSION='11.8'