I had Exla working at one point, but now it keeps crashing, and I can't figure out why

,

So I was trying to migrate some python code that I was running into my elixir code. My set up is that I have Elixir running most of the code, and rustler/rust bindings that run rust-bert for one model, and some python code that runs for another one. My goal was to bring as much as possible into Elixir. I found a model that I could run in Elixir, that was performing better than the one in Python, but there seemed to be a crossing of wires since both the Rust code, and the Elixir Nx code were using PyTorch. It seemed like whichever one ran first would get the lock on PyTorch, and the second one wouldn’t work(on the Rust side i’d see it was a error like this: Nif not loaded ... symbol not found in flat namespace '__ZN3c1019UndefinedTensorImpl10_singletonE'.

When I tried to instead run the Elixir side on Exla, I could get it to work breifly in a completely clean project, but not all together with everything else. When I load the model I get this output:

[info] TfrtCpuClient created.
[debug] the following PyTorch parameters were unused:

  * bert.embeddings.position_ids
  * bert.pooler.dense.bias
  * bert.pooler.dense.weight

Which seems strange because I’m trying to target EXLA, not PyTorch. To confirm right before I ran this operation, I ran the following:

iex(3)> Nx.default_backend
{EXLA.Backend, [client: :host]}

Though the tensors do state EXLA:

f32[768]
EXLA.Backend<host:0,...

The model is started as follows:

{:ok, model} = Bumblebee.load_model({:hf, "vblagoje/bert-english-uncased-finetuned-pos"})
  # This is only coming from local because there's no fast tokenizer "tokenizer.json" file on HF, but loading it in Python, then dumping to a file created one.
  {:ok, tokenizer} = Bumblebee.load_tokenizer({:local, "priv/tokenizer"})
  model = Bumblebee.Text.token_classification(model, tokenizer, aggregation: :same)

And called like:

Nx.Serving.run(model, "This is a test")

Which gives me the following:

libc++abi: terminating with uncaught exception of type std::out_of_range: Span::at failed bounds check
[1]    67025 abort      iex --erl "-kernel shell_history enabled" -S mix phx.server

I have tried:

  • Trying a different model, where both model and tokenizer are directly from HF, same result
  • Scorched earth removal of pytorch, tensorflow, all deps, _build, the works, and then rebuilding, same result
  • Completely removing rustler or anything else. Even trying in a completely clean project with just nx/bumblebee/axon/exla/jason, to try it there, same result.
  • restarting the computer, same result.

I’m not even sure how to debug where it’s crashing from.

Here are my mix.exs deps:

      {:axon, "~> 0.5.1"},
      {:bumblebee, "~> 0.2.0"},
      {:nx, "~> 0.5.1"},
      {:exla, "~> 0.5.1", sparse: "exla"},
      {:jason, "~> 1.0"}

Anyone have an insight or tips of what I could try?

3 Likes

Can you provide a Livebook or a single .exs file that reproduces this? Then I can at least run it here and let you know if it is something specific to your machine or not.

Also, double check your ENV vars just in case.

1 Like

Hey @jose! I’ll take a look and see if I can get something running on a livebook, or single file to reproduce the issue. I suspect it’s a “my machine”, rather than a library issue, so I suspect it’ll be hard, I was just wondering if this was an error anyone had seen.

But for now, I’ve moved to a production server. AWS G4dn.xl. Nvidia T4. I think I must be doing something wrong with my config.

I have 5 nonsense sentences I copied from a reddit post. The byte sizes are sents = [1079, 131, 109, 120, 130]

Yes, these are certainly not the most rigorous or fair benchmarks, but it was just to get a rough idea of if my config/settings in Elixir are good, which they appear not to be, so I’m working on how to get them dialed in. I’m sure if anyone can help, you might be able to at least point me in a good direction.

Basically I’m just running through the 5 sentences, 10 times on each system. They’re both running on the same system.

In Python:

model = BertForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
model     = model.to(device)
def test():
  start = time.time()
  with torch.no_grad():
    for i in range(10):
      for s in sents:
        inputs = tokenizer(s, return_tensors="pt").to(device)
        outputs = model(**inputs)
        logits = outputs.logits
        predicted_token_class_ids = logits.argmax(-1)
        predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]  
  return time.time() - start

calling test() here gives me 0.5248661041259766. I logged it on a separate run to make sure it wasn’t fast because it was throwing an error or something:

['PRON', 'ADV', 'PUNCT', 'SCONJ', 'DET', 'ADJ', 'NOUN', 'VERB', 'VERB', 'ADP', 'NOUN', 'NOUN', 'ADP', 'PRON', 'PUNCT', 'CCONJ', 'DET', 'ADJ', 'NOUN', 'VERB', 'DET', 'ADJ', 'NOUN', 'ADP', 'DET', 'ADJ', 'ADJ', 'ADV', 'ADJ', 'NOUN', 'ADP', 'PRON', 'NOUN', 'PUNCT', 'CCONJ', 'CCONJ', 'DET', 'ADJ', 'ADJ', 'NOUN', 'NOUN', 'VERB', 'ADP', 'DET', 'ADJ', 'NOUN', 'PUNCT', 'PRON', 'VERB', 'PRON', 'ADV', 'ADP', 'DET', 'ADJ', 'NOUN', 'ADP', 'DET', 'VERB', 'VERB', 'NOUN', 'PUNCT', 'CCONJ', 'PUNCT', 'SCONJ', 'PRON', 'VERB', 'ADV', 'ADP', 'DET', 'NOUN', 'PUNCT', 'DET', 'NUM', 'ADJ', 'NOUN', 'AUX', 'VERB', 'ADP', 'PRON', 'PUNCT', 'ADV', 'PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', 'ADP', 'DET', 'NOUN', 'PUNCT', 'CCONJ', 'VERB', 'ADJ', 'ADP', 'DET', 'ADJ', 'ADJ', 'ADV', 'ADV', 'ADV', 'ADJ', 'NOUN', 'ADP', 'DET', 'NOUN', 'CCONJ', 'NOUN', 'PUNCT', 'ADV', 'PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'DET', 'NOUN', 'PUNCT', 'PRON', 'VERB', 'PRON', 'ADP', 'PRON', 'ADJ', 'NOUN', 'PUNCT', 'CCONJ', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', 'PRON', 'VERB', 'CCONJ', 'VERB', 'VERB', 'PRON', 'PUNCT', 'SCONJ', 'PRON', 'VERB', 'ADP', 'PRON', 'ADP', 'DET', 'NOUN', 'ADP', 'NOUN', 'PUNCT', 'CCONJ', 'ADV', 'PUNCT', 'PRON', 'NOUN', 'PUNCT', 'ADV', 'NOUN', 'VERB', 'ADP', 'VERB', 'PRON', 'NOUN', 'PUNCT', 'CCONJ', 'NOUN', 'CCONJ', 'NOUN', 'VERB', 'PART', 'VERB', 'ADP', 'PRON', 'NOUN', 'CCONJ', 'VERB', 'PRON', 'NOUN', 'PUNCT', 'ADP', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', 'PUNCT', 'ADV', 'PRON', 'ADV', 'VERB', 'ADP', 'NOUN', 'PUNCT', 'INTJ', 'PUNCT', 'AUX', 'PRON', 'AUX', 'VERB', 'DET', 'NOUN', 'NOUN', 'PUNCT', 'AUX', 'VERB', 'ADP', 'NOUN', 'DET', 'PRON', 'AUX', 'VERB', 'ADV', 'ADJ', 'CCONJ', 'ADJ', 'ADP', 'PRON', 'PUNCT', 'SCONJ', 'PRON', 'AUX', 'AUX', 'DET', 'NOUN', 'ADP', 'PRON', 'NOUN', 'PUNCT', 'SCONJ', 'PRON', 'NOUN', 'AUX', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'PROPN', 'PUNCT', 'PUNCT']

But it’s definitely generating tokens.

In Elixir:

> Nx.default_backend({EXLA.Backend, client: :cuda})
> Nx.default_backend                               
{EXLA.Backend, [client: :cuda]}
> Nx.Defn.default_options
[compiler: EXLA, client: :cuda]
defmodule Test do
  @sents [
    # same as before, cut for brevity, but can add them if wanted
  ]
  def run do
     {:ok, model} = Bumblebee.load_model({:hf, "vblagoje/bert-english-uncased-finetuned-pos"})
     # Only reason this is coming from a local file instead of :hf, is because the fast tokenizer is not on HF hub
     # This local file is just a dump from HF python lib `tokenizer.save('path')`
     # Contents: config.json  special_tokens_map.json  tokenizer.json  tokenizer_config.json  vocab.txt
     {:ok, tokenizer} = Bumblebee.load_tokenizer({:local, "./tokenizer"})
     model = Bumblebee.Text.token_classification(model, tokenizer, aggregation: :same)
     :timer.tc(fn() ->
       Enum.each(1..10, fn _ ->
         Enum.each(@sents, fn sent ->
           Nx.Serving.run(model, sent)
         end)
       end)
     end) |> elem(0) |> Kernel./(1_000_000)
   end
 end
iex(7)> Test.run()

02:00:52.993 [info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
 
02:00:52.993 [info] XLA service 0x7f81a80d9a00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:

02:00:52.993 [info]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5

02:00:52.993 [info] Using BFC allocator.

02:00:52.993 [info] XLA backend allocating 14019467673 bytes on device 0 for BFCAllocator.
 
02:00:53.785 [debug] the following PyTorch parameters were unused:

  * bert.embeddings.position_ids
  * bert.pooler.dense.bias
  * bert.pooler.dense.weight

23.547926

My ~/.bashrc which I ran both source ~/.bashrc and exec $SHELL after, and confirmed config with echo $XLA_TARGET:

export LIBTORCH_TARGET="cu116"
export ELIXIR_ERL_OPTIONS="+sssdio 128"
export XLA_TARGET="cuda118"
export EXLA_TARGET="cuda"

I’m not sure if the cuda116 is bad in there. I’m running cuda 11.8 with driver 520, and cudnn 8.7.0.84. cuda116 was the highest listed for Torchx. Not sure if that matters if I’m using EXLA, but I notice the following PyTorch parameters... in the logs, so I assume PyTorch is still involved.

I’m trying to figure out what could be missing in the Elixir side to get it closer to the Python side.

I would also love to write up a blog post after I’ve been through this process of launching in production. The docs, as always with Elixir, have been super helpful and accessible for a ML newbie like me, but I have found them and the general blogosphere support still thin with info, which is to be expected at this early stage, but I saw there was a note to look at EXLA compiler flags, and defn opts, but there wasn’t any general advice about what’s a good starting place, and the EXLA flags were quite long, so I didn’t even know where to start.

Did you find a reason? I seem to get the same issue but on an intel Mac.


e[0mlibc++abi: terminating with uncaught exception of type std::out_of_range: Span::at failed bounds check
e[22m
23:25:23.818 [info] TfrtCpuClient created.
e[0me[36m
23:25:24.562 [debug] the following PyTorch parameters were unused:

  * roberta.embeddings.position_ids

e[0mlibc++abi: terminating with uncaught exception of type std::out_of_range: Span::at failed bounds check

1 Like

Same here. I have just reproduced some parts of: Semantic Search with Phoenix, Axon, Bumblebee, and ExFaiss - DockYard

I was able to run it on Ubuntu. But on my M1 Mac I am getting this error when calling Sommelier.Model.predict(“a nice red wine”)

1 Like

I’m very new to Elixir so might be way off, but I have the same issue on an Intel Mac (Apple clang version 14.0.3) when trying to run some Whisper code from @lawik. I added tracing to EXLA’s exla_nif_util.cc to identify the problem when the shape is an array:

} else if (shape.IsArray()) {
      cout << "IsArray" << endl;
      xla::PrimitiveType type = shape.element_type();
      absl::Span<const int64> dims = shape.dimensions();
      int64 rank = shape.rank();

      std::string name = xla::primitive_util::LowercasePrimitiveTypeName(type);
      cout << "rank: " << rank << endl;
      cout << "name: " << name << endl;

      std::vector<ERL_NIF_TERM> dim_arr;
      dim_arr.reserve(rank);
      for (int i = 0; i < rank; i++) {
        int copy;
        cout << "  i: " << i << ", rank: " << rank << endl;
        cout << "  i < rank: " << (i < rank) << endl;
        copy = dims.at(i);
        cout << "  dims[" << i << "]: " << copy << endl;
        dim_arr.push_back(make(env, copy));
      }

Some of the extra logging might seem redundant, but it was to convince me that I wasn’t imagining things. The bug appears to be that the for loop is still being executed when rank is zero. Here is what the output shows just prior to the crash:

IsArray
rank: 0
name: s64
  i: 0, rank: 0
  i < rank: 1
libc++abi: terminating due to uncaught exception of type std::out_of_range: Span::at failed bounds check

The original C++ code looks correct, however, interestingly, reducing the compiler optimization level in the Makefile from -O3 to -O1 (but not -O2) resolves the issue, so perhaps there is something I don’t understand about the kinds of optimizations that are allowed, but something tells me that the one applied going from -O1 to -O2 is too ambitious ;-).

1 Like

I’m able to reproduce this on an m1 max macbook pro. here’s a simple script which consistently shows the error:

(borrowed from bumblebee examples)

#! /usr/bin/env elixir

Mix.install(
  [
    {:bumblebee, "~> 0.2.0"},
    {:nx, "~> 0.5.1"},
    {:exla, "~> 0.5.1"},
    {:axon, "~> 0.5.1"}
  ],
  config: [
    nx: [default_backend: EXLA.Backend]
  ]
)

{:ok, model} = Bumblebee.load_model({:hf, "finiteautomata/bertweet-base-sentiment-analysis"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "vinai/bertweet-base"})

serving = Bumblebee.Text.text_classification(model, tokenizer)

Nx.Serving.run(serving, "Wow, that's amazing!")

This script outputs the following and then crashes:

18:05:53.502 [info] TfrtCpuClient created.

18:05:53.777 [debug] the following PyTorch parameters were unused:

  * roberta.embeddings.position_ids

libc++abi: terminating with uncaught exception of type std::out_of_range: Span::at failed bounds check
1 Like

@stocks29 Would you be able to take a minute to edit deps/exla/exla/Makefile in your project changing the -O3 in the CFLAGS line to -O1 and then run mix deps.compile exla to see if the issue resolves by reducing the compiler optimization level?

I’m seeing the same error after executing these steps:

  1. mix deps.clean exla
  2. mix deps.get
  3. adjusting the -O3 in deps/exla/Makefile to -O1.
  4. mix deps.compile exla

I tried the same steps again while also cleaning xla as well, and finally another attempt setting XLA_BUILD=true which did build successfully but still resulted in the same error.

1 Like

Thanks for the snippet. I just tried it locally and it worked. :cry: What is your Erlang/OTP version? I am running on:

$ elixir -v
Erlang/OTP 25 [erts-13.0] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Elixir 1.15.0-dev (6f58a36) (compiled with Erlang/OTP 25)

Also:

$ gcc -v
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
1 Like

I’m also experiencing the issue on:

macOS 13.3.1 (22E261) on M1
Xcode Version 14.3 (14E222b)

gcc -v
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
elixir -v
Erlang/OTP 25 [erts-13.2] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit] [dtrace]

Elixir 1.14.4 (compiled with Erlang/OTP 25)

Edit: Also tested with Xcode 14.2 since I noticed that your clang version is 14.0.0, but the issue still persists.

gcc -v
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode142.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Can you install llvm with home brew and try compiling with it in your path instead if apple’s one?

It doesn’t seem to have an effect, the error is still thrown.

Is there a way to check which version of gcc/clang elixir is using during the build?
What I did (on an older x86 Mac running the same macOS and Xcode):

  • brew install llvm gcc
  • updated ~/.zshrc to include:
export CPPFLAGS="-I/usr/local/opt/llvm/include"
export LDFLAGS="-L/usr/local/opt/llvm/lib"
export PATH="/usr/local/opt/llvm/bin:$PATH"
  • symlinked gcc as home-brew’s gcc-12
❯ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/12.2.0/bin/../libexec/gcc/x86_64-apple-darwin22/12/lto-wrapper
Target: x86_64-apple-darwin22
Configured with: ../configure --prefix=/usr/local/opt/gcc --libdir=/usr/local/opt/gcc/lib/gcc/current --disable-nls --enable-checking=release --with-gcc-major-version-only --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-12 --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl --with-zstd=/usr/local/opt/zstd --with-pkgversion='Homebrew GCC 12.2.0' --with-bugurl=https://github.com/Homebrew/homebrew-core/issues --with-system-zlib --build=x86_64-apple-darwin22 --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Homebrew GCC 12.2.0)
❯ clang -v
Homebrew clang version 16.0.1
Target: x86_64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm/bin

I am not sure. Make sure the dependency is rebuilding (there are some cache dirs) and, if there is nothing indicate in the logs, then you need to investigate the Makefile. Unfortunately I cannot help much since I cannot reproduce it.

I was able to get this working by setting the optimization level to -O1 in exla’s Makefile as @thomasf had suggested. The piece I was missing before was clearing the library cache.

Here are the steps I followed to get this working:

  1. mix deps.clean exla
  2. mix deps.get
  3. Modify deps/exla/Makefile to use -O1 instead of -O3.
  4. delete the exla library cache: rm -rf ~/Library/Caches/xla/exla (this was missing step)
  5. mix deps.compile exla

With this Bumblebee is now working as expected.

8 Likes

Same issue here using Livebook Desktop. I have a workaround:

Mix.install(
  [
    {:kino_bumblebee, "~> 0.2.1"},
    {:nx, git: "https://github.com/brentjanderson/nx.git", ref: "eb729f087998fbe73c1c13c71b59388b661812b3", subdir: "nx", override: true},
    {:exla, git: "https://github.com/brentjanderson/nx.git", ref: "eb729f087998fbe73c1c13c71b59388b661812b3", subdir: "exla", override: true}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)

If you use the :nx and :exla versions above, you can get around this temporarily. That fork only has the -O3 flag changed to -O1, that’s it.

As for my gcc and elixir versions:

❯ gcc -v
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin


❯ clang -v
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin


❯ elixir -v
Erlang/OTP 25 [erts-13.1.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit]

Elixir 1.14.4 (compiled with Erlang/OTP 25)
1 Like

I’ve been running into the same issue as well:

Notebook:


Mix.install(
  [
    {:kino_bumblebee, "~> 0.2.1"},
    {:exla, "~> 0.5.1"}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)

<!-- livebook:{"attrs":{"compiler":"exla","sequence_length":100,"task_id":"text_classification","top_k":null,"variant_id":"roberta_bertweet_emotion"},"chunks":[[0,366],[368,507]],"kind":"Elixir.KinoBumblebee.TaskCell","livebook_object":"smart_cell"} -->

{:ok, model_info} =
  Bumblebee.load_model({:hf, "finiteautomata/bertweet-base-emotion-analysis"},
    log_params_diff: false
  )

{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "vinai/bertweet-base"})

serving =
  Bumblebee.Text.text_classification(model_info, tokenizer,
    compile: [batch_size: 1, sequence_length: 100],
    defn_options: [compiler: EXLA]
  )

text_input = Kino.Input.textarea("Text", default: "Oh wow, I didn't know that!")
form = Kino.Control.form([text: text_input], submit: "Run")
frame = Kino.Frame.new()

Kino.listen(form, fn %{data: %{text: text}} ->
  Kino.Frame.render(frame, Kino.Text.new("Running..."))
  output = Nx.Serving.run(serving, text)

  output.predictions
  |> Enum.map(&{&1.label, &1.score})
  |> Kino.Bumblebee.ScoredList.new()
  |> then(&Kino.Frame.render(frame, &1))
end)

Kino.Layout.grid([form, frame], boxed: true, gap: 16)

Error:

16:10:35.087 [info] TfrtCpuClient created.
e[0mlibc++abi: terminating due to uncaught exception of type std::out_of_range: Span::at failed bounds check


My versions:

>gcc -v
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

>clang -v
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

>elixir -v
Erlang/OTP 25 [erts-13.1.4] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]

Elixir 1.14.2 (compiled with Erlang/OTP 25)

I can’t reproduce this. One thing that might be helpful though is if somebody can provide a backtrace from a core dump. Then I can try to debug from there.

Once you have a core dump you can analyze it in gdb with:

gdb /usr/lib/erlang/erts/bin/beam.smp -core /path/to/core

the beam.smp path is just off the top of my head, but just point the path to beam.smp. Once in gdb just run bt and provide the stacktrace that comes up!

2 Likes

Thank you for debugging. I tried to reproduce the error but I failed but I thought I should give your finding a try and push a fix.

Can you folks please try using EXLA from the v0.5 branch?

{:exla, github: "elixir-nx/nx", sparse: "exla", branch: "v0.5", override: true}

If it still fails, then I recommend following the suggestions posted by @seanmor5.

3 Likes

It worked for me Jose. Thanks!

1 Like