The livebook reports the following
06:26:28.007 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
06:26:28.020 [error] Memory usage: 327155712 bytes free, 3901685760 bytes total.
** (RuntimeError) DNN library initialization failed. Look at the errors above for more details.
(exla 0.9.1) lib/exla/mlir/module.ex:147: EXLA.MLIR.Module.unwrap!/1
(exla 0.9.1) lib/exla/mlir/module.ex:124: EXLA.MLIR.Module.compile/5
(stdlib 6.1.2) timer.erl:590: :timer.tc/2
(exla 0.9.1) lib/exla/defn.ex:432: anonymous fn/14 in EXLA.Defn.compile/8
(exla 0.9.1) lib/exla/mlir/context_pool.ex:10: anonymous fn/3 in EXLA.MLIR.ContextPool.checkout/1
(nimble_pool 1.1.0) lib/nimble_pool.ex:462: NimblePool.checkout!/4
(exla 0.9.1) lib/exla/defn/locked_cache.ex:36: EXLA.Defn.LockedCache.run/2
#cell:grt4sk6uht7mjljj:1: (file)
The terminal running the livebook reports the following
06:24:02.969 [info] Downloading a precompiled XLA archive for target x86_64-linux-gnu-cuda12
06:24:45.488 [info] Successfully downloaded the XLA archive
06:25:25.831 [debug] Downloading NIF from https://github.com/elixir-nx/tokenizers/releases/download/v0.5.1/libex_tokenizers-v0.5.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
06:25:26.877 [debug] NIF cached at /home/geegee/.cache/rustler_precompiled/precompiled_nifs/libex_tokenizers-v0.5.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz and extracted to /home/geegee/.cache/mix/installs/elixir-1.17.3-erts-15.1.2/f93a18a2a17ed8ae75e079a1309969e9/_build/dev/lib/tokenizers/priv/native/libex_tokenizers-v0.5.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1731734769.813825 1168221 cuda_executor.cc:1040] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1731734769.815001 1165283 service.cc:146] XLA service 0x7c1eb4062aa0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1731734769.815020 1165283 service.cc:154] StreamExecutor device (0): NVIDIA GeForce GTX 1650 Ti, Compute Capability 7.5
I0000 00:00:1731734769.815287 1165283 se_gpu_pjrt_client.cc:889] Using BFC allocator.
I0000 00:00:1731734769.815320 1165283 gpu_helpers.cc:114] XLA backend allocating 3511517184 bytes on device 0 for BFCAllocator.
I0000 00:00:1731734769.815342 1165283 gpu_helpers.cc:154] XLA backend will use up to 390168575 bytes on device 0 for CollectiveBFCAllocator.
I0000 00:00:1731734769.815417 1165283 cuda_executor.cc:1040] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
06:26:09.840 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
06:26:09.840 [error] Memory usage: 327155712 bytes free, 3901685760 bytes total.
06:26:09.840 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
06:26:09.840 [error] Memory usage: 327155712 bytes free, 3901685760 bytes total.
06:26:28.007 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
06:26:28.020 [error] Memory usage: 327155712 bytes free, 3901685760 bytes total.
06:26:28.020 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
06:26:28.020 [error] Memory usage: 327155712 bytes free, 3901685760 bytes total.
CUDNN_STATUS_INTERNAL_ERRORlspci -k -d ::03xx
I am running Arch linux and following information
regarding the nvidia card:_
01:00.0 3D controller: NVIDIA Corporation TU117M [GeForce GTX 1650 Ti Mobile] (rev a1)
Subsystem: Dell Device 097d
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia
Maybe it is just not possible with this type of graphics card?
Let me know if I should add any information as I am rather new to this.