urielfcampos

urielfcampos

Error when loading model with cuda EXLA client using Bumblebee

Hi everyone !

I managed to get bumblebee up and running on WSL2 using my CPU and decided to try and use my GPU for it and got everything installed according to Nvidia tutorials but when i try to load the model used in bumblebee’s example i get the following error:

iex(1)> {:ok, model_info} = Bumblebee.load_model({:hf, "bert-base-uncased"})

22:25:13.622 [info] could not open file to read NUMA node: /sys/bus/pci/devices/0000:0a:00.0/numa_node
Your kernel may have been built without NUMA support.

22:25:13.622 [info] XLA service 0x7fb97456c520 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

22:25:13.622 [info]   StreamExecutor device (0): NVIDIA GeForce RTX 3080, Compute Capability 8.6

22:25:13.622 [info] Using BFC allocator.

22:25:13.622 [info] XLA backend will use up to 8589515161 bytes on device 0 for BFCAllocator.

22:25:13.950 [error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

22:25:13.951 [error] Memory usage: 9446621184 bytes free, 10736893952 bytes total.
** (RuntimeError) DNN library initialization failed. Look at the errors above for more details.
    (exla 0.6.1) lib/exla/computation.ex:92: EXLA.Computation.unwrap!/1
    (exla 0.6.1) lib/exla/computation.ex:61: EXLA.Computation.compile/4
    (stdlib 5.0.2) timer.erl:270: :timer.tc/2
    (exla 0.6.1) lib/exla/defn.ex:430: anonymous fn/11 in EXLA.Defn.compile/8
    (exla 0.6.1) lib/exla/defn/locked_cache.ex:36: EXLA.Defn.LockedCache.run/2
    (stdlib 5.0.2) timer.erl:270: :timer.tc/2
    (exla 0.6.1) lib/exla/defn.ex:406: EXLA.Defn.compile/8
    iex:1: (file)

I did some research where it was pointed it might be a OOM error and i tried playing with the preallocate and memory_fraction options for the cuda EXLA client but alas nothing worked.
Also found some issues on tensorflow mentioning an option allow_growth but i don’t think that’s relevant.
Anyone went through something similar ?

Marked As Solved

jonatanklosko

jonatanklosko

Creator of Livebook

Ohh that makes sense, I couldn’t really work out what else it could be : )

FTR if someone runs into this, here are a couple checks for Ubuntu/Debian:

# Verify CUDA version
nvcc --version
# Verify cuDNN version, make sure it's installed and that the package matches CUDA version
apt-cache policy libcudnn8 | head -n 3
# Check drivers and CUDA support
nvidia-smi

Also Liked

urielfcampos

urielfcampos

Just to close the loop on this, i misunderstood the nvidia instructions on installing cuDNN so that was the problem :sweat: Thanks everyone !

NVIDIA’s deb package only creates a local apt repo to install cuDNN from, so you still need to run apt-get install libcudnn8/, apt-get install libcudnn8-dev and apt-get install libcudnn8-samples

coderhour

coderhour

I finally figure out it. For the future readers:

Originally, I download the latest cuda 12 and cudnn 9 from Nvidia official site which cause the issue. The fix is to use cudnn 8 (latest 8 works for me). I guess it’s because the XLA binary is compiled for cudnn8.

csokun

csokun

Thanks! for sharing it helps me resolve my issue.

Where Next?

Popular in Questions Top

mgjohns61585
Could someone help me? I’m making my first elixir program, number guessing game. I can’t figure out how to convert the user’s guess from ...
New
Kurisu
For example for a current url like http://localhost:4000/cosmetic/products?_utf8=✓&query=perfume&page=2, I would like to get: ...
New
senggen
Erlang/OTP 25 [erts-13.2.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] 15:22:35.803 [error] gen_event {lager_file_backend...
New
joeerl
Hello again - after a longish gap I’ve decided I really must dig into Elixir and see what’s been happening here - so I have a few questio...
New
pmjoe
I have a relationship of love and hate with Elixir. Lots of things are just absolutely right, but there are some things that are kind of ...
New
minhajuddin
I have seen a lot of code which picks the first element from a list using Enum.at(0) instead of List.first. Is there a reason why people ...
New
Emily
I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I forese...
New
chensan
I have a User schema with a :from_id field set to type :string: defmodule TweetBot.Repo.Migrations.CreateUsers do use Ecto.Migration ...
New
shijith.k
I am trying to start a new phoenix project with elixir 1.9, but mix phx.new does not work. It says that ** (Mix) The task "phx.new" could...
New
marick
I had some trouble figuring out how to make many-to-many associations work. Once I got it working, I wrote a blog post. Because I’m a nov...
New

Other popular topics Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
joeerl
Hello again - after a longish gap I’ve decided I really must dig into Elixir and see what’s been happening here - so I have a few questio...
New
jononomo
I am trying to figure out how Mix knows whether the environment is test, dev, or prod – where is this set? Thanks.
New
fireproofsocks
Forgive me if this is obvious, but how does one delete a database record WITHOUT selecting it first? Ecto.Repo — Ecto v3.14.0 has exampl...
New
RisingFromAshes
I’ve read in another post that it may be possible with a router helper - but I couldn’t find an appropriate one, and tbh, I’m still just ...
New
KronicDeth
Elixir plugin for JetBrain’s IntelliJ Platform (including Rubymine) This is a plugin that adds support for Elixir to JetBrains IntelliJ...
289 36128 110
New
rms.mrcs
Hi, I need to transform a list of numbers into a map where the keys are the indexes and the values are the original values of the list. ...
New
WestKeys
Currently suffering from paralysis by [HTTP client] analysis. This is rather unusual in Elixirland as there tends to be consensus on the ...
New
marick
I had some trouble figuring out how to make many-to-many associations work. Once I got it working, I wrote a blog post. Because I’m a nov...
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New

We're in Beta

About us Mission Statement