I have some UTF8 weirdness in my texts that causes an error in my bumblebee serving.
The problem is that it kills the serving process.
Here is the livebook to reproduce the issue:
Bumblebee error
Mix.install(
[
{:bumblebee, "~> 0.5.3"},
{:nx, "~> 0.7.2"},
{:exla, "~> 0.7.2"}
],
config: [nx: [default_backend: EXLA.Backend]]
)
Section
text = " "
model = "Davlan/bert-base-multilingual-cased-ner-hrl"
tokenizer = "google-bert/bert-base-multilingual-cased"
{:ok, bert} = Bumblebee.load_model({:hf, model})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, tokenizer})
serving =
Bumblebee.Text.token_classification(bert, tokenizer,
aggregation: :same,
compile: [batch_size: 10, sequence_length: 256]
)
|> Nx.Serving.defn_options(compiler: EXLA)
Nx.Serving.run(serving, text)
It causes a (FunctionClauseError) no function clause matching in Bumblebee.Text.TokenClassification.group_entities/2
.
I think this is not intended behavior, but I’m also curious if UTF8 maybe needs to be cleaned before sending text to the serving?