Hello, I have been trying to compile EXLA, to use it with Nx and Axon. Unfortunately, I’ve been having issues compiling it to work along with CUDA and cuDNN SDK.
I’ve installed NVIDIA drivers, CUDA, and cuDNN with the following tutorial:
TUTORIAL
And I’ve already tested that it correctly worked with their reference python code.
Now I’ve been trying to compile XLA to work with my machine’s current CUDA version (11.2). But after trying my best I’m still having some issues. For now, I’ve been receiving the following error:
tavano@tavano-os:~/git/xla$ iex -S mix
Erlang/OTP 24 [erts-12.0.4] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit]
mkdir -p /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82 && \
cd /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82 && \
git init && \
git remote add origin https://github.com/tensorflow/tensorflow.git && \
git fetch --depth 1 origin 54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82 && \
git checkout FETCH_HEAD
Initialized empty Git repository in /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.git/
From https://github.com/tensorflow/tensorflow
* branch 54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82 -> FETCH_HEAD
Note: switching to 'FETCH_HEAD'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 54dee6dd Fix shape arguments passed in local_client.
rm -f /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/tensorflow/compiler/xla/extension && \
ln -s "/home/tavano/git/xla/extension" /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/tensorflow/compiler/xla/extension && \
cd /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82 && \
bazel build --define "framework_shared_object=false" -c opt --config=cuda //tensorflow/compiler/xla/extension:xla_extension && \
mkdir -p /home/tavano/git/xla/cache/build/ && \
cp -f /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/bazel-bin/tensorflow/compiler/xla/extension/xla_extension.tar.gz /home/tavano/git/xla/cache/build/xla_extension-x86_64-linux-cuda111.tar.gz
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc:
'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --noincompatible_prohibit_aapt1 --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true
INFO: Found applicable config definition build:short_logs in file /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:linux in file /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels
INFO: Found applicable config definition build:dynamic_kernels in file /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: /home/tavano/.cache/bazel/_bazel_tavano/80b3fb99ea1bab987a9581bca23d819b/external/tf_runtime/third_party/cuda/dependencies.bzl:51:10: The following command will download NVIDIA proprietary software. By using the software you agree to comply with the terms of the license agreement that accompanies the software. If you do not agree to the terms of the license agreement, do not use the software.
INFO: Repository local_config_cuda instantiated at:
/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/WORKSPACE:15:14: in <toplevel>
/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/tensorflow/workspace2.bzl:1099:19: in workspace
/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/tensorflow/workspace2.bzl:90:19: in _tf_toolchains
Repository rule cuda_configure defined at:
/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/gpus/cuda_configure.bzl:1443:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
Traceback (most recent call last):
File "/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/gpus/cuda_configure.bzl", line 1396, column 38, in _cuda_autoconf_impl
_create_local_cuda_repository(repository_ctx)
File "/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/gpus/cuda_configure.bzl", line 977, column 35, in _create_local_cuda_repository
cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
File "/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/gpus/cuda_configure.bzl", line 666, column 30, in _get_cuda_config
config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn"])
File "/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/gpus/cuda_configure.bzl", line 643, column 41, in find_cuda_config
exec_result = _exec_find_cuda_config(repository_ctx, script_path, cuda_libraries)
File "/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/gpus/cuda_configure.bzl", line 637, column 19, in _exec_find_cuda_config
return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
File "/home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/third_party/remote_config/common.bzl", line 230, column 13, in execute
fail(
Error in fail: Repository command failed
Could not find any cudnn.h, cudnn_version.h matching version '' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/lib'
'/lib/i386-linux-gnu'
'/lib/x86_64-linux-gnu'
'/usr'
'/usr/lib/x86_64-linux-gnu/libfakeroot'
'/usr/local/cuda'
'/usr/local/cuda-11.2/targets/x86_64-linux/lib'
INFO: Found applicable config definition build:cuda in file /home/tavano/.cache/xla_extension/tf-54dee6dd8d47b6e597f4d3f85b6fb43fd5f50f82/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: Repository command failed
Could not find any cudnn.h, cudnn_version.h matching version '' in any subdirectory:
''
'include'
'include/cuda'
'include/*-linux-gnu'
'extras/CUPTI/include'
'include/cuda/CUPTI'
of:
'/lib'
'/lib/i386-linux-gnu'
'/lib/x86_64-linux-gnu'
'/usr'
'/usr/lib/x86_64-linux-gnu/libfakeroot'
'/usr/local/cuda'
'/usr/local/cuda-11.2/targets/x86_64-linux/lib'
make: *** [Makefile:28: /home/tavano/git/xla/cache/build/xla_extension-x86_64-linux-cuda111.tar.gz] Error 2
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install 'Development Tools'".
It seems like it is not being able to find cudnn.h
from my system, which is currently located at /usr/local/cuda/include/cudnn.h
. I’ve already tried to create a soft-link from cudnn.h to the path /usr/local/cuda
but still no sign of working.
OS: Ubuntu
OS version: 20.04.2 LTS (Focal Fossa)
.tool-versions
:
erlang 24.0.6
elixir 1.12.3-otp-24
python 3.8.0
bazel 3.7.2
my gcc version: gcc (Ubuntu 8.4.0-3ubuntu2) 8.4.0
Nvidia driver info:
NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2
cuDNN version: 8.1.1
Video card: GeForce GTX 1050 TI
Currently, I’ve also enabled these XLA/EXLA var envs:
XLA_BUILD=true
XLA_TARGET=cuda111
EXLA_TARGET=cuda
Could please someone try to help me? I’ve been really struggling to find Elixir XLA compilation guides and to debug my own issue.