Create ONNX files. Howto use imported model?


wanted to do a sentiment analysis using Elixir. Problem: Bumblebee has that only for English text.

Therefore, tried to load a [model] (oliverguhr/german-sentiment-bert · Hugging Face) from python and Pycharm and chatpgt tried to export the tokenizer and model with:

# Initialize the model
model = germansentiment.SentimentModel()

# Dummy input that matches the input dimensions of the model
dummy_input = torch.randint(0, 30_000, (1, 512), dtype=torch.long)

# Export to ONNX
torch.onnx.export(model.model, dummy_input, "german_sentiment_model.onnx")

# Export the vocab
with open('vocab.json', 'w') as f:
    json.dump(model.tokenizer.vocab, f)

With this I was able to export the model and the vocabulary. Now I try, infer in Elixir using Nx and Axon_onnx:

{model, params} = AxonOnnx.import("./models/models/german_sentiment_model.onnx")

{:ok, vocab_string} ="./models/models/vocab.json")
{:ok, vocab_map} = Jason.decode(vocab_string)

# Tokenize
input_text = "Ein schlechter Film"
token_list =, " "), fn x -> vocab_map[x] end)
token_tensor = Nx.tensor(List.duplicate(0, 512 - length(token_list)))
token_tensor = Nx.concatenate([Nx.tensor(token_list), token_tensor])

{init_fn, predict_fn} =

predict_fn.(params, token_tensor)

The output is:

  EXLA.Backend<host:0, 0.1233469648.4027973659.33323>
    [-1.17998206615448, 5.767077922821045, -5.835022926330566]

I assume from the weblink, that the zeroth argument is positive, the first is negative and the last is neutral. Just the scale is off. I assumend a sum of one.

Because this is export and use in elixir is a lot of first times for. I would start, with asking:

  1. Did I do it right?
  2. Would I do a cross entropy on the output?
  3. Anything I could enhance?
1 Like

For completeness: also take a look at Ortex, which uses the Onnx Runtime directly, so there is no conversion layer. You can then compare results!


As I want to evaluate political sentiments that are for some unknown reason scaled 5 to -5 (at least in Germany).

I used:

prediction = predict_fn.(params, token_tensor)
one_hot = Nx.divide(Nx.pow(2, prediction), Nx.sum(Nx.pow(2, prediction)))
politcal_score = 5*(Nx.to_number(one_hot[0][0]) - Nx.to_number(one_hot[0][2]))

to convert the prediction. It feels somewhat strange to convert a Tensor of a single value to a number just for add and multiply. Another thing that feels strange is the need for Nx.divide.
Something like ./ as in julia would be cool. ^^ (dreamer)

Something is still odd with the tokenization. An example did not work in Elixir but did on the website

[details=“Bad language example that did not work”]
“Ein scheiß Film”)
[/details] I can prohibit the crashing with setting the default of Map.get to zero. That just changes the meaning of the sentence.

Take a look at defn (numerical definitions). Those a functions where you can write numerical code that works with tensors using the regular Elixir operators. You would get something like (untested):

import Nx.Defn

defn political_score(predict_fn, params, token_tensor) do
  prediction = predict_fn.(params, token_tensor)
  one_hot = 2 ** prediction / Nx.sum(2 ** prediction)
  5 * (one_hot[0][0] - one_hot[0][2])

Thanks that thing with defn works nice.

Does anyone have some experience with reusing the tokenizer from python?
Looks, like that it uses huggingface’s transformers tokenizer with the costumn vocab.json (as described above). So far I have been using it JASON->Map->Get.
But I assume, that is not working for all keys!? Is there a standard alternative library I missed so far? Bumblebee also loads its tokenizer from somewhere. Is this compatible?

The Hugging Face tokenizers is also the one used by Bumblebee. I am not that well versed on tokenizers but different tokenizers may not actually split on words, but even on smaller tokens. You need to see which one was used to train the model.

Hey @sehHeiden, here’s a complete example in Bumblebee:

# German sentiment analysis

    {:bumblebee, "~> 0.3.1"},
    {:exla, "~> 0.6.0"}
  config: [nx: [default_backend: EXLA.Backend]]

## Prediction

{:ok, model_info} = Bumblebee.load_model({:hf, "oliverguhr/german-sentiment-bert"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "bert-base-german-cased"})

serving =
  Bumblebee.Text.text_classification(model_info, tokenizer, defn_options: [compiler: EXLA])

texts = [
  "Mit keinem guten Ergebniss",
  "Das ist gar nicht mal so gut",
  "Total awesome!",
  "nicht so schlecht wie erwartet",
  "Der Test verlief positiv.",
  "Sie fährt ein grünes Auto."
], texts)

Tokenizer details

There are two ways in which tokenizers can be stored on HF Hub. It’s either (1) tokenizer_config.json + vocab.txt + optional merges.txt (this is a dump of a “slow” tokenizer from hf/transformers), or (2) a single tokenizer.json file (this is a dump of a “fast” Rust tokenizer from hf/transformers). Oftentimes the repository includes both versions. In Python, hf/transformers have a logic to load (1) and convert to a fast tokenizer, but we always rely on tokenizer.json, which we hand to the underlying Rust library. When a repository doesn’t have tokenizer.json, it is usually possible to find another base repository with the same tokenizer, which does have that file. In this case I looked at their training code (ref), they fine-tune bert-base-german-cased, so they use the same tokenizer and we can load it from there just fine.


Hey @jonatanklosko,

you are wonderful! Everything works as expected! Better than what I tried, with less code and I assume it took less time to code? Except for the search for the language model?

I still have two questions. Out of interest.
As I ave the the vocab.json, is the the same as the tokenizer.json? Which Rust library is used in Elixir to load it?

Can you assume, why the onnx model has another output scale than the original/bumblebee version?

Thanks alot!

As I ave the the vocab.json, is the the same as the tokenizer.json?

tokenizer.json is a single file with all information, other than vocabulary, it includes special tokens information, tokenizer model, pre/post processing, etc. See tokenizer.json for an example.

Which Rust library is used in Elixir to load it?

There is huggingface/tokenizers in Rust and it also has Python bindings. huggingface/transformers have two types of tokenizers, slow - implemented purely in Python and fast - calling out to the Rust library. We have elixir-nx/tokenizers with bindings to the Rust library.

Can you assume, why the onnx model has another output scale than the original/bumblebee version?

I may be missing something, but I don’t think the calls are equivalent. You are splitting on space and using the vocab. The tokenizer on the other hand does more, it will split longer words into parts and add special tokens.

1 Like