Hmm, actually something strange happens. I’ve edited the source code for xla
library (added printing the required archive filename in the beginning of function download_matching!
which is located in file deps/xla/lib/xla.ex
). Then I changed XLA_TARGET
back to cuda111
and run mix compile exla
. This time the correct archive was downloaded according to logs:
/home/zeio/.cache/xla/0.2.0/cache/download/xla_extension-x86_64-linux-cuda111.tar.gz
18:57:12.780 [info] Found a matching archive (xla_extension-x86_64-linux-cuda111.tar.gz), going to download it
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 680 100 680 0 0 2054 0 --:--:-- --:--:-- --:--:-- 2054
100 158M 100 158M 0 0 6515k 0 0:00:24 0:00:24 --:--:-- 6346k
18:57:37.699 [info] Successfully downloaded the XLA archive
==> exla
Unpacking /home/zeio/.cache/xla/0.2.0/cache/download/xla_extension-x86_64-linux-cuda111.tar.gz into /home/zeio/grapex/deps/exla/exla/cache
mkdir -p /home/zeio/grapex/_build/dev/lib/exla/priv
ln -sf /home/zeio/grapex/deps/exla/exla/cache/xla_extension/lib /home/zeio/grapex/_build/dev/lib/exla/priv/lib
g++ -fPIC -I/usr/lib/erlang/erts-12.1.3/include -isystem cache/xla_extension/include -O3 -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers -Wno-comment -shared -std=c++14 c_src/exla/exla.cc c_src/exla/exla_nif_util.cc c_src/exla/exla_client.cc -o /home/zeio/grapex/_build/dev/lib/exla/priv/libexla.so -L/home/zeio/grapex/_build/dev/lib/exla/priv/lib -lxla_extension -Wl,-rpath,'$ORIGIN/lib'
c_src/exla/exla_client.cc: In function ‘tensorflow::StatusOr<std::vector<exla::ExlaBuffer*> > exla::UnpackRunArguments(ErlNifEnv*, ERL_NIF_TERM, exla::ExlaClient*, int)’:
c_src/exla/exla_client.cc:95:19: warning: redundant move in return statement [-Wredundant-move]
95 | return std::move(arg_buffers);
| ~~~~~~~~~^~~~~~~~~~~~~
c_src/exla/exla_client.cc:95:19: note: remove ‘std::move’ call
Compiling 21 files (.ex)
warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
lib/exla/defn/stream.ex:95: Nx.Stream.EXLA.Defn.Stream.nx_to_io/1
warning: Nx.Defn.global_default_options/1 is undefined or private
lib/exla.ex:211: EXLA.set_preferred_defn_options/1
warning: Nx.Defn.Composite.flatten_list/1 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
lib/exla/defn.ex:17: EXLA.Defn.__stream__/6
lib/exla/defn.ex:19: EXLA.Defn.__stream__/6
warning: Nx.Defn.Composite.reduce/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
Found at 2 locations:
lib/exla/defn.ex:331: EXLA.Defn.used_inputs_and_hooks/1
lib/exla/defn.ex:366: EXLA.Defn.recur_flatten/3
warning: Nx.Defn.default_options/0 is undefined or private
lib/exla/device_backend.ex:78: EXLA.DeviceBackend.default_client_name/0
warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
lib/exla/defn.ex:1405: EXLA.Defn.to_if_branch/5
warning: Nx.Defn.Tree.apply_args/3 is undefined or private
Found at 4 locations:
lib/exla/defn.ex:345: EXLA.Defn.used_inputs_and_hooks/2
lib/exla/defn.ex:467: EXLA.Defn.cached_recur_operator/4
lib/exla/defn.ex:1372: EXLA.Defn.collect_ids/2
lib/exla/defn.ex:1397: EXLA.Defn.collect_args/3
warning: Nx.Defn.Composite.traverse/3 is undefined (module Nx.Defn.Composite is not available or is yet to be defined)
lib/exla/defn/buffers.ex:9: EXLA.Defn.Buffers.to_nx!/3
warning: Nx.byte_size/1 is undefined or private
Found at 2 locations:
lib/exla/defn/buffers.ex:17: EXLA.Defn.Buffers.buffer_to_data/2
lib/exla/defn/buffers.ex:18: EXLA.Defn.Buffers.buffer_to_data/2
Generated exla app
Then I run my script via mix run main.exs
which contains model training, and it seems that the process was really happenining on the gpu which I was able to see in the output of the nvidia-smi
command:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| N/A 50C P8 3W / N/A | 695MiB / 3911MiB | 32% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1760 G /usr/lib/xorg/Xorg 45MiB |
| 0 N/A N/A 3295 G /usr/lib/xorg/Xorg 375MiB |
| 0 N/A N/A 3478 G /usr/bin/gnome-shell 70MiB |
| 0 N/A N/A 1225965 G ...AAAAAAAAA= --shared-files 64MiB |
| 0 N/A N/A 1600138 G ...AAAAAAAAA= --shared-files 32MiB |
| 0 N/A N/A 1600174 G ...AAAAAAAA== --shared-files 33MiB |
| 0 N/A N/A 1624190 C .../erts-12.1.3/bin/beam.smp 55MiB |
+-----------------------------------------------------------------------------+
At some moment I stopped the execution, the full output of my script at this point:
Compiling 1 file (.ex)
{:ok, %{'CUDA' => 1, 'Host' => 12}}
{:ok, #Reference<0.3594860528.3707371527.100288>}
18:59:57.763 [info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
18:59:57.766 [info] XLA service 0x7f30e4495730 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
18:59:57.766 [info] StreamExecutor device (0): GeForce GTX 1650, Compute Capability 7.5
18:59:57.766 [info] Using BFC allocator.
18:59:57.767 [info] XLA backend will use up to 3376283648 bytes on device 0 for BFCAllocator.
Input Files Path : /home/zeio/relentness/Assets/Corpora/Demo/0000/
Setting bern flag to false
---------------------------------------------------------------------------------------------
Model
=============================================================================================
Layer Shape Parameters
=============================================================================================
input_1 ( input ) {nil, 4, 2} 0
embedding_3 ( embedding ) {nil, 4, 2, 10} 580
reshape_4 ( reshape ) {nil, 4, 2, 1, 1, 10} 0
pad_5 ( pad ) {nil, 4, 2, 2, 10, 10} 0
input_6 ( input ) {nil, 4, 1} 0
embedding_8 ( embedding ) {nil, 4, 1, 200} 400
reshape_9 ( reshape ) {nil, 4, 1, 2, 10, 10} 0
concatenate_10 ( concatenate ["pad_5", "reshape_9"] ) {nil, 4, 3, 2, 10, 10} 0
---------------------------------------------------------------------------------------------
The model will not be saved during training because n-export-steps parameter has not been provided.
Epoch: 23, Batch: 10, Loss: 3.55707 ^C
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
(l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
The second and third lines contain output of the following two commands respectively:
IO.inspect EXLA.NIF.get_supported_platforms()
IO.inspect EXLA.NIF.get_gpu_client(1.0, 0)
After that I tried to run the same command again (mix run main.exs
), but this time the process has started on cpu instead of gpu, I’ve got the following output:
{:ok, %{'Host' => 12}}
{:error,
'Could not find registered platform with name: "cuda". Available platform names are: Host'}
Input Files Path : /home/zeio/relentness/Assets/Corpora/Demo/0000/
Setting bern flag to false
---------------------------------------------------------------------------------------------
Model
=============================================================================================
Layer Shape Parameters
=============================================================================================
input_1 ( input ) {nil, 4, 2} 0
embedding_3 ( embedding ) {nil, 4, 2, 10} 580
reshape_4 ( reshape ) {nil, 4, 2, 1, 1, 10} 0
pad_5 ( pad ) {nil, 4, 2, 2, 10, 10} 0
input_6 ( input ) {nil, 4, 1} 0
embedding_8 ( embedding ) {nil, 4, 1, 200} 400
reshape_9 ( reshape ) {nil, 4, 1, 2, 10, 10} 0
concatenate_10 ( concatenate ["pad_5", "reshape_9"] ) {nil, 4, 3, 2, 10, 10} 0
---------------------------------------------------------------------------------------------
The model will not be saved during training because n-export-steps parameter has not been provided.
Epoch: 15, Batch: 6, Loss: 4.28919 ^C
BREAK: (a)bort (A)bort with dump (c)ontinue (p)roc info (i)nfo
(l)oaded (v)ersion (k)ill (D)b-tables (d)istribution
And any subsequent run behaves in the similar way - it cannot detect cuda
client and use gpu anymore, always running computations on the cpu.