Hey everyone. Trying to get a simple EXLA app going here and running into issues with EXLA recognizing CUDA. All the standard nvcc & nvidia-smi & cuDNN commands seem to come up good and I can run some Python apps with everything, so its wired up correctly someway or somehow down there.
But, when trying to compile I keep getting this error and it is just really confusing me. I have only a slight understanding of what :rocm even is, let alone would choose to supply it as a command yet here I am getting it as an error. I was just hoping that maybe someone with some deeper knowledge might be able to offer a possible explanation or suggestion as to what to do next here.
I will post some relevant config and can always post more, but I am also just curious why this is even happening at all.
Its kind of funny how just asking can sometimes reveal things you missed. So I did finally hone in on that second line where it seems it is just using :rocm as the default client? If so, any idea as to how I can change that. Also, wouldn’t that just be :host?
That message also makes me want to set XLA_TARGET to :cuda but I am pretty sure that is not right either.
Will do. In the prior message it states it was being set to cuda120 but made it seem that is the wrong input, and even maybe the wrong format (its wants an atom?)
Actually, lets just assume its not being set or is being set incorrectly. If that were so, how might just set it in the correct context or whatever is needed to override the other places I have it set?
Yeah, by looking at the URL, which should show up in those logs on a clean install (i.e. force: true), we’ll be able to see if you’re actually getting the xla cuda artifact
Was back to original error. It kept unpacking a diff XLA version:
Unpacking /home/ar3rz/.cache/xla/0.5.1/cache/download/xla_extension-x86_64-linux-gnu-cpu.tar.gz into /home/ar3rz/elixir/...
I am guessing that should match what I got during the Mix.install output, correct? That was
Unpacking /home/ar3rz/.cache/xla/0.5.1/cache/download/xla_extension-x86_64-linux-gnu-cuda120.tar.gz into ...
So I erased my XLA cache, killed all the deps and reran mix deps.get and mix.deps clean --exla etc…
Now running into this. I have not really looked into this one yet too much, because quite frankly I’m kinda irritated with the whole process. I’ll be back though after I poke around a bit hopefully post a solution, but if not to ask another round of annoying questions for everyone
I really do appreciate your help though @polvalente
Here is what I am facing now:
➜ emer_phx git:(master) ✗ mix deps.compile exla --force
==> exla
g++ -fPIC -I/home/ar3rz/.asdf/installs/erlang/26.1/erts-14.1/include -Icache/xla_extension/include -O3 -Wall -Wno-sign-compare -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++17 -w -DLLVM_VERSION_STRING= c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o cache/libexla.so -Lcache/xla_extension/lib -lxla_extension -Wl,-rpath,'$ORIGIN/xla_extension/lib'
In file included from c_src/exla/exla.cc:3:
c_src/exla/exla_nif_util.h:12:10: fatal error: xla/xla_data.pb.h: No such file or directory
12 | #include "xla/xla_data.pb.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from c_src/exla/exla_nif_util.cc:1:
c_src/exla/exla_nif_util.h:12:10: fatal error: xla/xla_data.pb.h: No such file or directory
12 | #include "xla/xla_data.pb.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
In file included from c_src/exla/exla_client.h:8,
from c_src/exla/exla_client.cc:1:
c_src/exla/exla_nif_util.h:12:10: fatal error: xla/xla_data.pb.h: No such file or directory
12 | #include "xla/xla_data.pb.h"
| ^~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:57: cache/libexla.so] Error 1
could not compile dependency :exla, "mix compile" failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile exla --force", update it with "mix deps.update exla" or clean it with "mix deps.clean exla"
==> emer_phx
** (Mix) Could not compile with "make" (exit status: 2).
You need to have gcc and make installed. If you are using
Ubuntu or any other Debian-based system, install the packages
"build-essential". Also install "erlang-dev" package if not
included in your Erlang/OTP version. If you're on Fedora, run
"dnf group install 'Development Tools'".
Just taking stab to try to get a fresh XLA version