Installing EXLA in ubuntu 20

Hello, I have been trying to get Exla installed for use with Nx and Axon. I am running Ubuntu 20 and I have read through the instructions and installed exla’s system dependencies (build-essential, erlang-dev, bazel 3.7.2, python3 and numpy, direnv) but when I compile exla I get this error:

ERROR: /home/karang/.cache/bazel/_bazel_karang/0cbb144c3d68d1f180f564ed331d591d/external/llvm-project/llvm/BUILD:46:18: Executing genrule @llvm-project//llvm:config_gen failed (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped)
unknown command: python3. Perhaps you have to reshim?
----------------
Note: The failure of target //third_party/llvm:expand_cmake_vars (with exit code 1) may have been caused by the fact that it is running under Python 3 instead of Python 2. Examine the error to determine if that appears to be the problem. Since this target is built in the host configuration, the only way to change its version is to set --host_force_python=PY2, which affects the entire build.

If this error started occurring in Bazel 0.27 and later, it may be because the Python toolchain now enforces that targets analyzed as PY2 and PY3 run under a Python 2 and Python 3 interpreter, respectively. See https://github.com/bazelbuild/bazel/issues/7899 for more information.
----------------
Target //tensorflow/compiler/xla/exla:libexla.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 9.398s, Critical Path: 8.88s
INFO: 96 processes: 16 internal, 80 local.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
make: *** [Makefile:32: all] Error 1
could not compile dependency :exla, "mix compile" failed. You can recompile this dependency with "mix deps.compile exla", update it with "mix deps.update exla" or clean it with "mix deps.clean exla"
==> ml_test
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install 'Development Tools'".

At first it couldn’t find python so I used the asdf direnv trick but now it says it can’t find python3.

here is my .tools-version

erlang 24.0.1
elixir 1.12.1-otp-24
python 3.9.5
bazel 3.7.2

Has anyone managed to get EXLA working in ubuntu? Thank you!

How did you install python3? The error message says it is not available and you have to reshim it, which may be a requirement depending on the tool you used to install it.

@karang it cant find python3 - maybe this section in the exla readme can be helpful nx/exla at main · elixir-nx/nx · GitHub

Python and asdf

Bazel cannot find python installed via the asdf version manager by default. asdf uses a function to lookup the specified version of a given binary, this approach prevents Bazel from being able to correctly build EXLA. The error is unknown command: python. Perhaps you have to reshim?. There are two known workarounds:

  1. Use a separate installer or explicitly change your $PATH to point to a Python installation (note the build process looks for python, not python3). For example, on Homebrew on macOS, you would do:
export PATH=/usr/local/opt/python@3.9/libexec/bin:/usr/local/bin:$PATH
mix deps.compile
  1. Use the asdf direnv plugin to install direnv 2.20.0. direnv along with the asdf-direnv plugin will explicitly set the paths for any binary specified in your project’s .tool-version files.

After doing any of the steps above, it may be necessary to clear the build cache by removing ~/.cache/exla.

do keep us posted if it works, and/or readme needs updating…

Thanks guys! I used asdf to install python and method 2 as per above to to deal with it. I installed direnv and set the tools versions in my mix project. I just cleared the cache and tried again and it gave me the same error. So I tried to remove the python plugin from asdf but this does not remove the shims and then bazel complains that the shims don’t exist. So i reinstalled python with asdf and double check my direnv setup to make sure the PATH is set correctly and bazel still can’t find python3. I’ll let you know if I get it to work but so far method 2 has not worked for me and I am not sure which directory I can add to my path if I wanted to use method 1.

2 Likes

I’m running into the same errors using Ubuntu 20.04, RocM + Tensorflow, asdf managed Python and Bazel as described above. I’ve also tried aliasing ‘python’ to ‘python3’ but same scenario.

I’m on Ubuntu 20.04 too. I think I’ve progressed further than @MrDoops but the behaviour did surprise me a little. mix deps.compile on a new project with Nx, EXLA and Axon as dependencies generated this output. I’m not sure if it finished, I let it run for about 4 hours!

Just not sure what I should have expected to happen.

Oh, I can’t attest to this being helpful but I did export PYTHON_BIN_PATH=/usr/bin/python3.8

[0 / 2] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[2 / 5,456] Linking external/com_google_protobuf/protoc [for host]; 0s local … (4 actions, 3 running)
[39 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 0s local … (3 actions, 2 running)
[78 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 1s local … (4 actions, 3 running)
[94 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 3s local … (4 actions running)
[95 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 4s local … (4 actions, 3 running)
[102 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 6s local … (4 actions, 3 running)
[103 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 7s local … (4 actions running)
[103 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 9s local … (4 actions running)
[104 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 11s local … (4 actions running)
[105 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 13s local … (4 actions running)
[107 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 16s local … (4 actions running)
[108 / 5,456] Compiling tensorflow/core/util/test_log.pb.cc [for host]; 18s local … (4 actions running)
[114 / 5,456] Compiling tensorflow/core/framework/graph_transfer_info.pb.cc [for host]; 6s local … (4 actions running)
[129 / 5,456] Compiling tensorflow/core/framework/summary.pb.cc [for host]; 2s local … (4 actions running)
[131 / 5,456] Compiling tensorflow/core/example/example_parser_configuration.pb.cc [for host]; 5s local … (4 actions running)
[135 / 5,456] Compiling tensorflow/core/protobuf/error_codes.pb.cc [for host]; 1s local … (4 actions running)
[138 / 5,456] Compiling tensorflow/core/framework/step_stats.pb.cc [for host]; 6s local … (4 actions running)
[150 / 5,456] Compiling tensorflow/core/framework/node_def.pb.cc [for host]; 4s local … (4 actions running)
[168 / 5,456] Compiling tensorflow/core/framework/function.pb.cc [for host]; 8s local … (4 actions running)
[170 / 5,456] Compiling tensorflow/core/framework/function.pb.cc [for host]; 16s local … (4 actions running)
[174 / 5,456] Compiling tensorflow/core/protobuf/config.pb.cc [for host]; 20s local … (4 actions running)
[183 / 5,456] Compiling tensorflow/core/protobuf/rewriter_config.pb.cc [for host]; 9s local … (4 actions running)
[188 / 5,456] Compiling tensorflow/core/protobuf/meta_graph.pb.cc [for host]; 14s local … (4 actions running)
[203 / 5,456] Compiling tensorflow/core/profiler/protobuf/xplane.pb.cc [for host]; 10s local … (4 actions running)
[221 / 5,456] Generating code from table: lib/Target/X86/X86.td @llvm-project//llvm:X86CommonTableGen__gen_dag_isel_genrule; 5s local … (4 actions running)
[548 / 5,523] Compiling llvm-project/mlir/lib/IR/BuiltinTypes.cpp [for host]; 10s local … (4 actions running)
[560 / 5,523] Compiling llvm-project/mlir/lib/IR/Operation.cpp [for host]; 10s local … (4 actions running)
[576 / 5,523] Compiling llvm-project/mlir/lib/IR/AsmPrinter.cpp [for host]; 14s local … (4 actions running)
[785 / 5,872] Compiling llvm-project/mlir/tools/mlir-linalg-ods-gen/mlir-linalg-ods-yaml-gen.cpp [for host]; 14s local … (4 actions running)
[811 / 5,872] Compiling tensorflow/core/framework/lookup_interface.cc [for host]; 7s local … (4 actions running)
[821 / 5,872] Compiling tensorflow/core/framework/tensor_util.cc [for host]; 9s local … (4 actions running)
[836 / 5,872] Compiling tensorflow/core/util/batch_util.cc [for host]; 12s local … (4 actions running)
[848 / 5,872] Compiling tensorflow/core/util/batch_util.cc [for host]; 66s local … (4 actions running)
[868 / 5,872] Compiling tensorflow/core/framework/common_shape_fns.cc [for host]; 11s local … (4 actions running)
[894 / 5,872] Compiling tensorflow/core/lib/io/record_reader.cc [for host]; 7s local … (4 actions running)
[928 / 5,874] Compiling tensorflow/core/framework/device_factory.cc [for host]; 14s local … (4 actions running)
[1,033 / 5,989] Compiling tensorflow/core/ops/nn_ops.cc [for host]; 17s local … (4 actions running)
[1,135 / 6,076] Compiling tensorflow/core/ops/math_ops.cc [for host]; 25s local … (4 actions running)
[1,193 / 6,076] Compiling tensorflow/core/platform/default/env.cc; 12s local … (4 actions, 3 running)
[1,257 / 6,076] Compiling tensorflow/stream_executor/stream.cc; 24s local … (4 actions, 3 running)
[1,308 / 6,076] Compiling tensorflow/core/framework/shape_inference.cc; 15s local … (4 actions, 3 running)
[1,359 / 6,076] Compiling tensorflow/core/util/batch_util.cc; 91s local … (4 actions, 3 running)
[1,406 / 6,076] Compiling tensorflow/compiler/xla/service/hlo_computation.cc; 15s local … (4 actions, 3 running)
[1,448 / 6,076] Compiling tensorflow/compiler/xla/service/compiler.cc; 27s local … (4 actions, 3 running)
[1,549 / 6,076] Compiling llvm-project/llvm/lib/ProfileData/InstrProf.cpp; 7s local … (4 actions, 3 running)
[1,662 / 6,076] Compiling llvm-project/llvm/lib/Analysis/InstCount.cpp; 9s local … (4 actions, 3 running)
[1,774 / 6,076] Compiling llvm-project/llvm/lib/Transforms/Utils/CodeExtractor.cpp; 21s local … (4 actions, 3 running)
[1,880 / 6,076] Compiling llvm-project/llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp; 16s local … (4 actions running)
[1,985 / 6,076] Compiling tensorflow/compiler/mlir/tensorflow/transforms/rewrite_tpu_embedding_ops.cc; 85s local … (4 actions running)
[2,070 / 6,076] Compiling tensorflow/compiler/mlir/tensorflow/transforms/cluster_ops_by_policy_pass.cc; 16s local … (4 actions, 3 running)
[2,173 / 6,076] Compiling llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp; 84s local … (4 actions running)
[2,294 / 6,076] Compiling llvm-project/llvm/lib/Target/X86/X86FastISel.cpp; 29s local … (4 actions running)
[2,438 / 6,076] Compiling tensorflow/compiler/tf2xla/cc/ops/xla_ops.cc; 13s local … (4 actions, 3 running)
[2,672 / 6,077] Compiling tensorflow/compiler/xla/service/dynamic_dimension_inference.cc; 23s local … (4 actions, 3 running)
[2,841 / 6,077] Compiling llvm-project/llvm/lib/Target/PowerPC/PPCFastISel.cpp; 16s local … (4 actions, 3 running)
[3,279 / 6,077] Compiling tensorflow/core/graph/algorithm.cc; 12s local … (4 actions, 3 running)
[3,992 / 6,077] Compiling tensorflow/core/kernels/list_kernels.cc; 48s local … (4 actions running)
[4,241 / 6,077] Compiling mkl_dnn_v1/src/cpu/x64/jit_sse41_1x1_convolution.cpp; 8s local … (4 actions, 3 running)
[4,521 / 6,077] Compiling tensorflow/core/kernels/mkl/mkl_conv_ops.cc; 87s local … (4 actions, 3 running)

That’s the correct output, EXLA takes a really long to compile (it has to compile a lot of TensorFlow). When it finishes you’ll see an output with something like Linking libexla.so.

1 Like

Hey, that’s good to know, finishing off this morning :slight_smile:

This really helped me! It is now compiling! All the fans are running!

2 Likes

It seems now bazel can find python after running the above command but a new problem has arisen. When exla starts compiling I start up htop to watch my memory and cpu usage and I see all cores running at close to 100% and then at some point (always different) RAM will get filled up and the compilation crashes. I have 16G ram and 16 cores on my machine this should be enough right? How can I avoid running out of memory during exla compilation? Thank you!!

Check here: Memory-saving Mode - Bazel main

EXLA looks for BAZEL_FLAGS so you can set any of those flags by setting BAZEL_FLAGS as an environment variable

1 Like

You can also lower the number of jobs using the —jobs= flag

1 Like

Thanks for the tip. I think now the problem is I am not setting the flags properly. I tried this:

$ export BAZEL_FLAGS=--discard_analysis_cache,--nokeep_state_after_build,--notrack_incremental_state,--jobs=8,--host_jvm_args=-Xmx8g

But it did not seem to make a difference. I tried some different combinations of the above all to no avail. All cores were running (I thought only 8 should be active) and it still ran out of memory. I also tried to set it like this:

$ BAZEL_FLAGS=--discard_analysis_cache,--nokeep_state_after_build,--notrack_incremental_state,--jobs=8,--host_jvm_args=-Xmx8g mix deps.compile exla

But that didn’t seem to work either. Is this the correct way to set the BAZEL_FLAGS variable. Thanks!

Is it possible to use a precompiled xla library? I’ve found this repo but somewhy still cannot run my models on gpu. When I am trying to explicitly specify EXLA compiler in the code, I get the following error:

01:01:51.192 [error] GenServer EXLA.Client terminating
** (RuntimeError) Could not find registered platform with name: "cuda". Available platform names are: Host
    (exla 0.1.0-dev) lib/exla/client.ex:153: EXLA.Client.unwrap!/1
    (exla 0.1.0-dev) lib/exla/client.ex:134: EXLA.Client.build_client/2
    (exla 0.1.0-dev) lib/exla/client.ex:94: EXLA.Client.handle_call/3
    (stdlib 3.16.1) gen_server.erl:721: :gen_server.try_handle_call/4
    (stdlib 3.16.1) gen_server.erl:750: :gen_server.handle_msg/6
    (stdlib 3.16.1) proc_lib.erl:226: :proc_lib.init_p_do_apply/3

I’ve an exla dependency in my mix.exs:

{:exla, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "exla"}

I’ve set up configuration in the config/config.exs:

config :nx, :default_defn_options, [compiler: EXLA, client: :cuda]
config :exla, :clients, cuda: [platform: :cuda], default: [platform: :cuda]

I’ve the following environment variables as well:

XLA_BUILD=true
XLA_TARGET=cuda111
EXLA_TARGET=cuda
TF_CUDA_VERSION='11.2'

And off course I’ve installed cuda and cudnn so it is recognized by the tensorflow library. Command python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" produces following output:

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Do I miss something?

Remove XLA_BUILD=true and it should pick up a precompiled version based on your XLA_TARGET. If it doesn’t work, please post the full results of mix deps.get plus mix compile. :slight_smile:

Thank you for the response! I’ve followed your instructions and reset the env variable XLA_BUILD. Then I deleted the deps and _build folders and run command mix deps.get, which produced the following output:

* Getting nx (https://github.com/zeionara/nx.git)
remote: Enumerating objects: 10503, done.        
remote: Counting objects: 100% (2118/2118), done.        
remote: Compressing objects: 100% (304/304), done.        
remote: Total 10503 (delta 1908), reused 1924 (delta 1807), pack-reused 8385        
Receiving objects: 100% (10503/10503), 2.68 MiB | 1.75 MiB/s, done.
Resolving deltas: 100% (6947/6947), done.
* Getting exla (https://github.com/elixir-nx/nx.git)
remote: Enumerating objects: 10586, done.        
remote: Counting objects: 100% (2490/2490), done.        
remote: Compressing objects: 100% (427/427), done.        
remote: Total 10586 (delta 2159), reused 2270 (delta 2054), pack-reused 8096        
Receiving objects: 100% (10586/10586), 2.85 MiB | 1.59 MiB/s, done.
Resolving deltas: 100% (7009/7009), done.
* Getting axon (https://github.com/zeionara/axon.git - origin/epoch-completion-handler)
remote: Enumerating objects: 2727, done.        
remote: Counting objects: 100% (1903/1903), done.        
remote: Compressing objects: 100% (1123/1123), done.        
remote: Total 2727 (delta 1406), reused 1175 (delta 761), pack-reused 824        
Receiving objects: 100% (2727/2727), 9.03 MiB | 1.23 MiB/s, done.
Resolving deltas: 100% (1864/1864), done.
* Getting axon_onnx (https://github.com/zeionara/axon_onnx.git - origin/master)
remote: Enumerating objects: 156, done.        
remote: Counting objects: 100% (156/156), done.        
remote: Compressing objects: 100% (109/109), done.        
remote: Total 156 (delta 63), reused 127 (delta 40), pack-reused 0        
Receiving objects: 100% (156/156), 54.87 KiB | 1021.00 KiB/s, done.
Resolving deltas: 100% (63/63), done.
Resolving Hex dependencies...
Dependency resolution completed:
Unchanged:
  dialyxir 1.1.0
  elixir_make 0.6.3
  elixir_uuid 1.2.1
  erlex 0.2.6
  optimus 0.2.0
  protox 1.4.0
  table_rex 3.1.1
  xla 0.2.0
* Getting dialyxir (Hex package)
* Getting optimus (Hex package)
* Getting elixir_uuid (Hex package)
* Getting erlex (Hex package)
* Getting protox (Hex package)
* Getting table_rex (Hex package)
* Getting xla (Hex package)
* Getting elixir_make (Hex package)

After that I executed mix compile which resulted in this log:

==> elixir_uuid
Compiling 1 file (.ex)
warning: :crypto.hash/2 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Found at 2 locations:
  lib/uuid.ex:589: UUID.namebased_uuid/2
  lib/uuid.ex:593: UUID.namebased_uuid/2

warning: :crypto.strong_rand_bytes/1 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Found at 3 locations:
  lib/uuid.ex:383: UUID.uuid4/1
  lib/uuid.ex:560: UUID.uuid1_clockseq/0
  lib/uuid.ex:583: UUID.uuid1_node/1

Generated elixir_uuid app
==> erlex
Compiling 1 file (.yrl)
src/parser.yrl: Warning: conflicts: 27 shift/reduce, 0 reduce/reduce
Compiling 1 file (.xrl)
Compiling 2 files (.erl)
Compiling 1 file (.ex)
Generated erlex app
==> nx
Compiling 20 files (.ex)
Generated nx app
==> dialyxir
Compiling 58 files (.ex)
Generated dialyxir app
==> elixir_make
Compiling 1 file (.ex)
Generated elixir_make app
==> xla
Compiling 2 files (.ex)
Generated xla app
==> exla
Unpacking /home/zeio/.cache/xla/0.2.0/cache/download/xla_extension-x86_64-linux-cpu.tar.gz into /home/zeio/grapex/deps/exla/exla/cache
mkdir -p /home/zeio/grapex/_build/dev/lib/exla/priv
ln -sf /home/zeio/grapex/deps/exla/exla/cache/xla_extension/lib /home/zeio/grapex/_build/dev/lib/exla/priv/lib
g++ -fPIC -I/usr/lib/erlang/erts-12.1.3/include -isystem cache/xla_extension/include -O3 -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o /home/zeio/grapex/_build/dev/lib/exla/priv/libexla.so -L/home/zeio/grapex/_build/dev/lib/exla/priv/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
c_src/exla/exla_client.cc: In function ‘tensorflow::StatusOr<std::vector<exla::ExlaBuffer*> > exla::UnpackRunArguments(ErlNifEnv*, ERL_NIF_TERM, exla::ExlaClient*, int)’:
c_src/exla/exla_client.cc:95:19: warning: redundant move in return statement [-Wredundant-move]
   95 |   return std::move(arg_buffers);
      |          ~~~~~~~~~^~~~~~~~~~~~~
c_src/exla/exla_client.cc:95:19: note: remove ‘std::move’ call
Compiling 21 files (.ex)
warning: Nx.Defn.default_options/0 is undefined or private
  lib/exla/device_backend.ex:78: EXLA.DeviceBackend.default_client_name/0

warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn/stream.ex:95: Nx.Stream.EXLA.Defn.Stream.nx_to_io/1

warning: Nx.Defn.global_default_options/1 is undefined or private
  lib/exla.ex:211: EXLA.set_preferred_defn_options/1

warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn/buffers.ex:9: EXLA.Defn.Buffers.to_nx!/3

warning: Nx.byte_size/1 is undefined or private
Found at 2 locations:
  lib/exla/defn/buffers.ex:17: EXLA.Defn.Buffers.buffer_to_data/2
  lib/exla/defn/buffers.ex:18: EXLA.Defn.Buffers.buffer_to_data/2

warning: Nx.Defn.Composite.flatten_list/1 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
  lib/exla/defn.ex:17: EXLA.Defn.__stream__/6
  lib/exla/defn.ex:19: EXLA.Defn.__stream__/6

warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
  lib/exla/defn.ex:331: EXLA.Defn.used_inputs_and_hooks/1
  lib/exla/defn.ex:366: EXLA.Defn.recur_flatten/3

warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn.ex:1405: EXLA.Defn.to_if_branch/5

warning: Nx.Defn.Tree.apply_args/3 is undefined or private
Found at 4 locations:
  lib/exla/defn.ex:345: EXLA.Defn.used_inputs_and_hooks/2
  lib/exla/defn.ex:467: EXLA.Defn.cached_recur_operator/4
  lib/exla/defn.ex:1372: EXLA.Defn.collect_ids/2
  lib/exla/defn.ex:1397: EXLA.Defn.collect_args/3

Generated exla app
==> protox
Compiling 22 files (.ex)
Generated protox app
==> optimus
Compiling 17 files (.ex)
Generated optimus app
==> table_rex
Compiling 7 files (.ex)
Generated table_rex app
==> axon
Compiling 20 files (.ex)
Generated axon app
==> axon_onnx
Compiling 3 files (.ex)
Generated axon_onnx app
==> grapex
Compiling 8 files (.ex)
warning: variable "verbose" is unused (if the variable is not meant to be used, prefix it with an underscore)
  lib/grapex/models/se.ex:60: Grapex.Model.Se.compute_score/2

warning: function fix_shape/1 is unused
  lib/grapex/models/se.ex:31

warning: function multiply/2 is unused
  lib/grapex/models/se.ex:46

warning: TranseHeterogenous.save/1 is undefined (module TranseHeterogenous is not available or is yet to be defined)
  lib/grapex.ex:299: Grapex.main/1

warning: TranseHeterogenous.test_or_validate/1 is undefined (module TranseHeterogenous is not available or is yet to be defined)
  lib/grapex.ex:298: Grapex.main/1

warning: TranseHeterogenous.train_or_import/1 is undefined (module TranseHeterogenous is not available or is yet to be defined)
  lib/grapex.ex:297: Grapex.main/1

Generated grapex app

It seems that exla unpacks cached version of the library compiled for cpu (xla_extension-x86_64-linux-cpu.tar.gz) which was downloaded before I started my attempts to run models on gpu. That’s why I tried to remove the ~/.cache/xla folder and run the compilation again, but the output didn’t change significantly:

==> elixir_uuid
Compiling 1 file (.ex)
warning: :crypto.hash/2 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Found at 2 locations:
  lib/uuid.ex:589: UUID.namebased_uuid/2
  lib/uuid.ex:593: UUID.namebased_uuid/2

warning: :crypto.strong_rand_bytes/1 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Found at 3 locations:
  lib/uuid.ex:383: UUID.uuid4/1
  lib/uuid.ex:560: UUID.uuid1_clockseq/0
  lib/uuid.ex:583: UUID.uuid1_node/1

Generated elixir_uuid app
==> erlex
Compiling 2 files (.erl)
Compiling 1 file (.ex)
Generated erlex app
==> nx
Compiling 20 files (.ex)
Generated nx app
==> dialyxir
Compiling 58 files (.ex)
Generated dialyxir app
==> elixir_make
Compiling 1 file (.ex)
Generated elixir_make app
==> xla
Compiling 2 files (.ex)
Generated xla app

18:22:49.977 [info]  Found a matching archive (xla_extension-x86_64-linux-cpu.tar.gz), going to download it
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   676  100   676    0     0   1867      0 --:--:-- --:--:-- --:--:--  1862
100 91.9M  100 91.9M    0     0  8727k      0  0:00:10  0:00:10 --:--:-- 10.2M

18:23:00.782 [info]  Successfully downloaded the XLA archive
==> exla
mkdir -p /home/zeio/grapex/_build/dev/lib/exla/priv
ln -sf /home/zeio/grapex/deps/exla/exla/cache/xla_extension/lib /home/zeio/grapex/_build/dev/lib/exla/priv/lib
g++ -fPIC -I/usr/lib/erlang/erts-12.1.3/include -isystem cache/xla_extension/include -O3 -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o /home/zeio/grapex/_build/dev/lib/exla/priv/libexla.so -L/home/zeio/grapex/_build/dev/lib/exla/priv/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
c_src/exla/exla_client.cc: In function ‘tensorflow::StatusOr<std::vector<exla::ExlaBuffer*> > exla::UnpackRunArguments(ErlNifEnv*, ERL_NIF_TERM, exla::ExlaClient*, int)’:
c_src/exla/exla_client.cc:95:19: warning: redundant move in return statement [-Wredundant-move]
   95 |   return std::move(arg_buffers);
      |          ~~~~~~~~~^~~~~~~~~~~~~
c_src/exla/exla_client.cc:95:19: note: remove ‘std::move’ call
Compiling 21 files (.ex)
warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn/stream.ex:95: Nx.Stream.EXLA.Defn.Stream.nx_to_io/1

warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn/buffers.ex:9: EXLA.Defn.Buffers.to_nx!/3

warning: Nx.Defn.Composite.flatten_list/1 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
  lib/exla/defn.ex:17: EXLA.Defn.__stream__/6
  lib/exla/defn.ex:19: EXLA.Defn.__stream__/6

warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
  lib/exla/defn.ex:331: EXLA.Defn.used_inputs_and_hooks/1
  lib/exla/defn.ex:366: EXLA.Defn.recur_flatten/3

warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn.ex:1405: EXLA.Defn.to_if_branch/5

warning: Nx.Defn.default_options/0 is undefined or private
  lib/exla/device_backend.ex:78: EXLA.DeviceBackend.default_client_name/0

warning: Nx.Defn.global_default_options/1 is undefined or private
  lib/exla.ex:211: EXLA.set_preferred_defn_options/1

warning: Nx.Defn.Tree.apply_args/3 is undefined or private
Found at 4 locations:
  lib/exla/defn.ex:345: EXLA.Defn.used_inputs_and_hooks/2
  lib/exla/defn.ex:467: EXLA.Defn.cached_recur_operator/4
  lib/exla/defn.ex:1372: EXLA.Defn.collect_ids/2
  lib/exla/defn.ex:1397: EXLA.Defn.collect_args/3

warning: Nx.byte_size/1 is undefined or private
Found at 2 locations:
  lib/exla/defn/buffers.ex:17: EXLA.Defn.Buffers.buffer_to_data/2
  lib/exla/defn/buffers.ex:18: EXLA.Defn.Buffers.buffer_to_data/2

Generated exla app
==> protox
Compiling 22 files (.ex)
Generated protox app
==> optimus
Compiling 17 files (.ex)
Generated optimus app
==> table_rex
Compiling 7 files (.ex)
Generated table_rex app
==> axon
Compiling 20 files (.ex)
Generated axon app
==> axon_onnx
Compiling 3 files (.ex)
Generated axon_onnx app
==> grapex
Compiling 8 files (.ex)
warning: variable "verbose" is unused (if the variable is not meant to be used, prefix it with an underscore)
  lib/grapex/models/se.ex:60: Grapex.Model.Se.compute_score/2

warning: function fix_shape/1 is unused
  lib/grapex/models/se.ex:31

warning: function multiply/2 is unused
  lib/grapex/models/se.ex:46

warning: TranseHeterogenous.save/1 is undefined (module TranseHeterogenous is not available or is yet to be defined)
  lib/grapex.ex:299: Grapex.main/1

warning: TranseHeterogenous.test_or_validate/1 is undefined (module TranseHeterogenous is not available or is yet to be defined)
  lib/grapex.ex:298: Grapex.main/1

warning: TranseHeterogenous.train_or_import/1 is undefined (module TranseHeterogenous is not available or is yet to be defined)
  lib/grapex.ex:297: Grapex.main/1

Generated grapex app

So it just downloaded the same version of the library even though my env variable XLA_TARGET accepts value cuda111. Then I tried to change this value from cuda111 to just cuda, delete the _build and ~/.cache/xla folders and compile again, which produced a runtime error:

==> elixir_uuid
Compiling 1 file (.ex)
warning: :crypto.hash/2 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Found at 2 locations:
  lib/uuid.ex:589: UUID.namebased_uuid/2
  lib/uuid.ex:593: UUID.namebased_uuid/2

warning: :crypto.strong_rand_bytes/1 defined in application :crypto is used by the current application but the current application does not depend on :crypto. To fix this, you must do one of:

  1. If :crypto is part of Erlang/Elixir, you must include it under :extra_applications inside "def application" in your mix.exs

  2. If :crypto is a dependency, make sure it is listed under "def deps" in your mix.exs

  3. In case you don't want to add a requirement to :crypto, you may optionally skip this warning by adding [xref: [exclude: [:crypto]]] to your "def project" in mix.exs

Found at 3 locations:
  lib/uuid.ex:383: UUID.uuid4/1
  lib/uuid.ex:560: UUID.uuid1_clockseq/0
  lib/uuid.ex:583: UUID.uuid1_node/1

Generated elixir_uuid app
==> erlex
Compiling 2 files (.erl)
Compiling 1 file (.ex)
Generated erlex app
==> nx
Compiling 20 files (.ex)
Generated nx app
==> dialyxir
Compiling 58 files (.ex)
Generated dialyxir app
==> elixir_make
Compiling 1 file (.ex)
Generated elixir_make app
==> xla
Compiling 2 files (.ex)
Generated xla app
==> exla
could not compile dependency :exla, "mix compile" failed. You can recompile this dependency with "mix deps.compile exla", update it with "mix deps.update exla" or clean it with "mix deps.clean exla"
** (RuntimeError) none of the precompiled archives matches your target
  Expected:
    * xla_extension-x86_64-linux-cuda.tar.gz
  Found:
    * xla_extension-aarch64-darwin-cpu.tar.gz
    * xla_extension-aarch64-linux-cpu.tar.gz
    * xla_extension-aarch64-linux-cuda102.tar.gz
    * xla_extension-x86_64-darwin-cpu.tar.gz
    * xla_extension-x86_64-linux-cpu.tar.gz
    * xla_extension-x86_64-linux-cuda102.tar.gz
    * xla_extension-x86_64-linux-cuda110.tar.gz
    * xla_extension-x86_64-linux-cuda111.tar.gz
    * xla_extension-x86_64-linux-tpu.tar.gz

You can compile XLA locally by setting an environment variable: XLA_BUILD=true
    (xla 0.2.0) lib/xla.ex:171: XLA.download_matching!/1
    (xla 0.2.0) lib/xla.ex:33: XLA.archive_path!/0
    /home/zeio/grapex/deps/exla/exla/mix.exs:73: EXLA.MixProject.compile/1
    (mix 1.12.2) lib/mix/task.ex:458: Mix.Task.run_alias/5
    (mix 1.12.2) lib/mix/tasks/compile.all.ex:92: Mix.Tasks.Compile.All.run_compiler/2
    (mix 1.12.2) lib/mix/tasks/compile.all.ex:72: Mix.Tasks.Compile.All.compile/4
    (mix 1.12.2) lib/mix/tasks/compile.all.ex:59: Mix.Tasks.Compile.All.with_logger_app/2
    (mix 1.12.2) lib/mix/tasks/compile.all.ex:36: Mix.Tasks.Compile.All.run/1

Clearly during the previous compilation for some reason it didn’t want to download archive xla_extension-x86_64-linux-cuda111.tar.gz and still preferred to use the version for cpu.

Hmm, actually something strange happens. I’ve edited the source code for xla library (added printing the required archive filename in the beginning of function download_matching! which is located in file deps/xla/lib/xla.ex). Then I changed XLA_TARGET back to cuda111 and run mix compile exla. This time the correct archive was downloaded according to logs:

/home/zeio/.cache/xla/0.2.0/cache/download/xla_extension-x86_64-linux-cuda111.tar.gz

18:57:12.780 [info]  Found a matching archive (xla_extension-x86_64-linux-cuda111.tar.gz), going to download it
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   680  100   680    0     0   2054      0 --:--:-- --:--:-- --:--:--  2054
100  158M  100  158M    0     0  6515k      0  0:00:24  0:00:24 --:--:-- 6346k

18:57:37.699 [info]  Successfully downloaded the XLA archive
==> exla
Unpacking /home/zeio/.cache/xla/0.2.0/cache/download/xla_extension-x86_64-linux-cuda111.tar.gz into /home/zeio/grapex/deps/exla/exla/cache
mkdir -p /home/zeio/grapex/_build/dev/lib/exla/priv
ln -sf /home/zeio/grapex/deps/exla/exla/cache/xla_extension/lib /home/zeio/grapex/_build/dev/lib/exla/priv/lib
g++ -fPIC -I/usr/lib/erlang/erts-12.1.3/include -isystem cache/xla_extension/include -O3 -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o /home/zeio/grapex/_build/dev/lib/exla/priv/libexla.so -L/home/zeio/grapex/_build/dev/lib/exla/priv/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
c_src/exla/exla_client.cc: In function ‘tensorflow::StatusOr<std::vector<exla::ExlaBuffer*> > exla::UnpackRunArguments(ErlNifEnv*, ERL_NIF_TERM, exla::ExlaClient*, int)’:
c_src/exla/exla_client.cc:95:19: warning: redundant move in return statement [-Wredundant-move]
   95 |   return std::move(arg_buffers);
      |          ~~~~~~~~~^~~~~~~~~~~~~
c_src/exla/exla_client.cc:95:19: note: remove ‘std::move’ call
Compiling 21 files (.ex)
warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn/stream.ex:95: Nx.Stream.EXLA.Defn.Stream.nx_to_io/1

warning: Nx.Defn.global_default_options/1 is undefined or private
  lib/exla.ex:211: EXLA.set_preferred_defn_options/1

warning: Nx.Defn.Composite.flatten_list/1 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
  lib/exla/defn.ex:17: EXLA.Defn.__stream__/6
  lib/exla/defn.ex:19: EXLA.Defn.__stream__/6

warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
  lib/exla/defn.ex:331: EXLA.Defn.used_inputs_and_hooks/1
  lib/exla/defn.ex:366: EXLA.Defn.recur_flatten/3

warning: Nx.Defn.default_options/0 is undefined or private
  lib/exla/device_backend.ex:78: EXLA.DeviceBackend.default_client_name/0

warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn.ex:1405: EXLA.Defn.to_if_branch/5

warning: Nx.Defn.Tree.apply_args/3 is undefined or private
Found at 4 locations:
  lib/exla/defn.ex:345: EXLA.Defn.used_inputs_and_hooks/2
  lib/exla/defn.ex:467: EXLA.Defn.cached_recur_operator/4
  lib/exla/defn.ex:1372: EXLA.Defn.collect_ids/2
  lib/exla/defn.ex:1397: EXLA.Defn.collect_args/3

warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
  lib/exla/defn/buffers.ex:9: EXLA.Defn.Buffers.to_nx!/3

warning: Nx.byte_size/1 is undefined or private
Found at 2 locations:
  lib/exla/defn/buffers.ex:17: EXLA.Defn.Buffers.buffer_to_data/2
  lib/exla/defn/buffers.ex:18: EXLA.Defn.Buffers.buffer_to_data/2

Generated exla app

Then I run my script via mix run main.exs which contains model training, and it seems that the process was really happenining on the gpu which I was able to see in the output of the nvidia-smi command:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03    Driver Version: 460.91.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1650    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   50C    P8     3W /  N/A |    695MiB /  3911MiB |     32%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1760      G   /usr/lib/xorg/Xorg                 45MiB |
|    0   N/A  N/A      3295      G   /usr/lib/xorg/Xorg                375MiB |
|    0   N/A  N/A      3478      G   /usr/bin/gnome-shell               70MiB |
|    0   N/A  N/A   1225965      G   ...AAAAAAAAA= --shared-files       64MiB |
|    0   N/A  N/A   1600138      G   ...AAAAAAAAA= --shared-files       32MiB |
|    0   N/A  N/A   1600174      G   ...AAAAAAAA== --shared-files       33MiB |
|    0   N/A  N/A   1624190      C   .../erts-12.1.3/bin/beam.smp       55MiB |
+-----------------------------------------------------------------------------+

At some moment I stopped the execution, the full output of my script at this point:

Compiling 1 file (.ex)
{:ok, %{'CUDA' => 1, 'Host' => 12}}
{:ok, #Reference<0.3594860528.3707371527.100288>}

18:59:57.763 [info]  successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero

18:59:57.766 [info]  XLA service 0x7f30e4495730 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

18:59:57.766 [info]    StreamExecutor device (0): GeForce GTX 1650, Compute Capability 7.5

18:59:57.766 [info]  Using BFC allocator.

18:59:57.767 [info]  XLA backend will use up to 3376283648 bytes on device 0 for BFCAllocator.
Input Files Path : /home/zeio/relentness/Assets/Corpora/Demo/0000/
Setting bern flag to false
---------------------------------------------------------------------------------------------
                                            Model
=============================================================================================
 Layer                                                   Shape                    Parameters
=============================================================================================
 input_1 ( input )                                       {nil, 4, 2}              0
 embedding_3 ( embedding )                               {nil, 4, 2, 10}          580
 reshape_4 ( reshape )                                   {nil, 4, 2, 1, 1, 10}    0
 pad_5 ( pad )                                           {nil, 4, 2, 2, 10, 10}   0
 input_6 ( input )                                       {nil, 4, 1}              0
 embedding_8 ( embedding )                               {nil, 4, 1, 200}         400
 reshape_9 ( reshape )                                   {nil, 4, 1, 2, 10, 10}   0
 concatenate_10 ( concatenate ["pad_5", "reshape_9"] )   {nil, 4, 3, 2, 10, 10}   0
---------------------------------------------------------------------------------------------

The model will not be saved during training because n-export-steps parameter has not been provided.
Epoch: 23, Batch: 10, Loss: 3.55707 ^C
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
       (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution

The second and third lines contain output of the following two commands respectively:

IO.inspect EXLA.NIF.get_supported_platforms()
IO.inspect EXLA.NIF.get_gpu_client(1.0, 0)

After that I tried to run the same command again (mix run main.exs), but this time the process has started on cpu instead of gpu, I’ve got the following output:

{:ok, %{'Host' => 12}}
{:error,
 'Could not find registered platform with name: "cuda". Available platform names are: Host'}
Input Files Path : /home/zeio/relentness/Assets/Corpora/Demo/0000/
Setting bern flag to false
---------------------------------------------------------------------------------------------
                                            Model
=============================================================================================
 Layer                                                   Shape                    Parameters
=============================================================================================
 input_1 ( input )                                       {nil, 4, 2}              0
 embedding_3 ( embedding )                               {nil, 4, 2, 10}          580
 reshape_4 ( reshape )                                   {nil, 4, 2, 1, 1, 10}    0
 pad_5 ( pad )                                           {nil, 4, 2, 2, 10, 10}   0
 input_6 ( input )                                       {nil, 4, 1}              0
 embedding_8 ( embedding )                               {nil, 4, 1, 200}         400
 reshape_9 ( reshape )                                   {nil, 4, 1, 2, 10, 10}   0
 concatenate_10 ( concatenate ["pad_5", "reshape_9"] )   {nil, 4, 3, 2, 10, 10}   0
---------------------------------------------------------------------------------------------

The model will not be saved during training because n-export-steps parameter has not been provided.
Epoch: 15, Batch: 6, Loss: 4.28919 ^C
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
       (l)oaded (v)ersion (k)ill (D)b-tables (d)istribution

And any subsequent run behaves in the similar way - it cannot detect cuda client and use gpu anymore, always running computations on the cpu.

It seems the value of XLA_TARGET is not making it all the way to xla then?