Hi,
wanted to do a sentiment analysis using Elixir. Problem: Bumblebee has that only for English text.
Therefore, tried to load a [model] (oliverguhr/german-sentiment-bert · Hugging Face) from python and Pycharm and chatpgt tried to export the tokenizer and model with:
# Initialize the model
model = germansentiment.SentimentModel()
# Dummy input that matches the input dimensions of the model
dummy_input = torch.randint(0, 30_000, (1, 512), dtype=torch.long)
# Export to ONNX
torch.onnx.export(model.model, dummy_input, "german_sentiment_model.onnx")
# Export the vocab
with open('vocab.json', 'w') as f:
json.dump(model.tokenizer.vocab, f)
With this I was able to export the model and the vocabulary. Now I try, infer in Elixir using Nx and Axon_onnx:
{model, params} = AxonOnnx.import("./models/models/german_sentiment_model.onnx")
{:ok, vocab_string} = File.read("./models/models/vocab.json")
{:ok, vocab_map} = Jason.decode(vocab_string)
# Tokenize
input_text = "Ein schlechter Film"
token_list = Enum.map(String.split(input_text, " "), fn x -> vocab_map[x] end)
token_tensor = Nx.tensor(List.duplicate(0, 512 - length(token_list)))
token_tensor = Nx.concatenate([Nx.tensor(token_list), token_tensor])
{init_fn, predict_fn} = Axon.build(model)
predict_fn.(params, token_tensor)
The output is:
#Nx.Tensor<
f32[1][3]
EXLA.Backend<host:0, 0.1233469648.4027973659.33323>
[
[-1.17998206615448, 5.767077922821045, -5.835022926330566]
]
>
I assume from the weblink, that the zeroth argument is positive, the first is negative and the last is neutral. Just the scale is off. I assumend a sum of one.
Because this is export and use in elixir is a lot of first times for. I would start, with asking:
- Did I do it right?
- Would I do a cross entropy on the output?
- Anything I could enhance?