Bumblebee token_classification serving group_entities no function clause matching error

I have some UTF8 weirdness in my texts that causes an error in my bumblebee serving.

The problem is that it kills the serving process.

Here is the livebook to reproduce the issue:

Bumblebee error

Mix.install(
  [
    {:bumblebee, "~> 0.5.3"},
    {:nx, "~> 0.7.2"},
    {:exla, "~> 0.7.2"}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)

Section

text = "        "
model = "Davlan/bert-base-multilingual-cased-ner-hrl"
tokenizer = "google-bert/bert-base-multilingual-cased"

{:ok, bert} = Bumblebee.load_model({:hf, model})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, tokenizer})

serving =
  Bumblebee.Text.token_classification(bert, tokenizer,
    aggregation: :same,
    compile: [batch_size: 10, sequence_length: 256]
  )
  |> Nx.Serving.defn_options(compiler: EXLA)
Nx.Serving.run(serving, text)

It causes a (FunctionClauseError) no function clause matching in Bumblebee.Text.TokenClassification.group_entities/2 .

I think this is not intended behavior, but I’m also curious if UTF8 maybe needs to be cleaned before sending text to the serving?

Hey @preciz, thanks for the report, fixed on main : )

1 Like