Hey @jose! I’ll take a look and see if I can get something running on a livebook, or single file to reproduce the issue. I suspect it’s a “my machine”, rather than a library issue, so I suspect it’ll be hard, I was just wondering if this was an error anyone had seen.
But for now, I’ve moved to a production server. AWS G4dn.xl. Nvidia T4. I think I must be doing something wrong with my config.
I have 5 nonsense sentences I copied from a reddit post. The byte sizes are sents = [1079, 131, 109, 120, 130]
Yes, these are certainly not the most rigorous or fair benchmarks, but it was just to get a rough idea of if my config/settings in Elixir are good, which they appear not to be, so I’m working on how to get them dialed in. I’m sure if anyone can help, you might be able to at least point me in a good direction.
Basically I’m just running through the 5 sentences, 10 times on each system. They’re both running on the same system.
In Python:
model = BertForTokenClassification.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
tokenizer = AutoTokenizer.from_pretrained("vblagoje/bert-english-uncased-finetuned-pos")
model = model.to(device)
def test():
start = time.time()
with torch.no_grad():
for i in range(10):
for s in sents:
inputs = tokenizer(s, return_tensors="pt").to(device)
outputs = model(**inputs)
logits = outputs.logits
predicted_token_class_ids = logits.argmax(-1)
predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]]
return time.time() - start
calling test()
here gives me 0.5248661041259766
. I logged it on a separate run to make sure it wasn’t fast because it was throwing an error or something:
['PRON', 'ADV', 'PUNCT', 'SCONJ', 'DET', 'ADJ', 'NOUN', 'VERB', 'VERB', 'ADP', 'NOUN', 'NOUN', 'ADP', 'PRON', 'PUNCT', 'CCONJ', 'DET', 'ADJ', 'NOUN', 'VERB', 'DET', 'ADJ', 'NOUN', 'ADP', 'DET', 'ADJ', 'ADJ', 'ADV', 'ADJ', 'NOUN', 'ADP', 'PRON', 'NOUN', 'PUNCT', 'CCONJ', 'CCONJ', 'DET', 'ADJ', 'ADJ', 'NOUN', 'NOUN', 'VERB', 'ADP', 'DET', 'ADJ', 'NOUN', 'PUNCT', 'PRON', 'VERB', 'PRON', 'ADV', 'ADP', 'DET', 'ADJ', 'NOUN', 'ADP', 'DET', 'VERB', 'VERB', 'NOUN', 'PUNCT', 'CCONJ', 'PUNCT', 'SCONJ', 'PRON', 'VERB', 'ADV', 'ADP', 'DET', 'NOUN', 'PUNCT', 'DET', 'NUM', 'ADJ', 'NOUN', 'AUX', 'VERB', 'ADP', 'PRON', 'PUNCT', 'ADV', 'PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', 'ADP', 'DET', 'NOUN', 'PUNCT', 'CCONJ', 'VERB', 'ADJ', 'ADP', 'DET', 'ADJ', 'ADJ', 'ADV', 'ADV', 'ADV', 'ADJ', 'NOUN', 'ADP', 'DET', 'NOUN', 'CCONJ', 'NOUN', 'PUNCT', 'ADV', 'PRON', 'VERB', 'DET', 'NOUN', 'ADP', 'DET', 'NOUN', 'PUNCT', 'PRON', 'VERB', 'PRON', 'ADP', 'PRON', 'ADJ', 'NOUN', 'PUNCT', 'CCONJ', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', 'PRON', 'VERB', 'CCONJ', 'VERB', 'VERB', 'PRON', 'PUNCT', 'SCONJ', 'PRON', 'VERB', 'ADP', 'PRON', 'ADP', 'DET', 'NOUN', 'ADP', 'NOUN', 'PUNCT', 'CCONJ', 'ADV', 'PUNCT', 'PRON', 'NOUN', 'PUNCT', 'ADV', 'NOUN', 'VERB', 'ADP', 'VERB', 'PRON', 'NOUN', 'PUNCT', 'CCONJ', 'NOUN', 'CCONJ', 'NOUN', 'VERB', 'PART', 'VERB', 'ADP', 'PRON', 'NOUN', 'CCONJ', 'VERB', 'PRON', 'NOUN', 'PUNCT', 'ADP', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'NOUN', 'PUNCT', 'ADV', 'PRON', 'ADV', 'VERB', 'ADP', 'NOUN', 'PUNCT', 'INTJ', 'PUNCT', 'AUX', 'PRON', 'AUX', 'VERB', 'DET', 'NOUN', 'NOUN', 'PUNCT', 'AUX', 'VERB', 'ADP', 'NOUN', 'DET', 'PRON', 'AUX', 'VERB', 'ADV', 'ADJ', 'CCONJ', 'ADJ', 'ADP', 'PRON', 'PUNCT', 'SCONJ', 'PRON', 'AUX', 'AUX', 'DET', 'NOUN', 'ADP', 'PRON', 'NOUN', 'PUNCT', 'SCONJ', 'PRON', 'NOUN', 'AUX', 'DET', 'NOUN', 'ADP', 'DET', 'ADJ', 'PROPN', 'PUNCT', 'PUNCT']
But it’s definitely generating tokens.
In Elixir:
> Nx.default_backend({EXLA.Backend, client: :cuda})
> Nx.default_backend
{EXLA.Backend, [client: :cuda]}
> Nx.Defn.default_options
[compiler: EXLA, client: :cuda]
defmodule Test do
@sents [
# same as before, cut for brevity, but can add them if wanted
]
def run do
{:ok, model} = Bumblebee.load_model({:hf, "vblagoje/bert-english-uncased-finetuned-pos"})
# Only reason this is coming from a local file instead of :hf, is because the fast tokenizer is not on HF hub
# This local file is just a dump from HF python lib `tokenizer.save('path')`
# Contents: config.json special_tokens_map.json tokenizer.json tokenizer_config.json vocab.txt
{:ok, tokenizer} = Bumblebee.load_tokenizer({:local, "./tokenizer"})
model = Bumblebee.Text.token_classification(model, tokenizer, aggregation: :same)
:timer.tc(fn() ->
Enum.each(1..10, fn _ ->
Enum.each(@sents, fn sent ->
Nx.Serving.run(model, sent)
end)
end)
end) |> elem(0) |> Kernel./(1_000_000)
end
end
iex(7)> Test.run()
02:00:52.993 [info] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
02:00:52.993 [info] XLA service 0x7f81a80d9a00 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
02:00:52.993 [info] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
02:00:52.993 [info] Using BFC allocator.
02:00:52.993 [info] XLA backend allocating 14019467673 bytes on device 0 for BFCAllocator.
02:00:53.785 [debug] the following PyTorch parameters were unused:
* bert.embeddings.position_ids
* bert.pooler.dense.bias
* bert.pooler.dense.weight
23.547926
My ~/.bashrc which I ran both source ~/.bashrc
and exec $SHELL
after, and confirmed config with echo $XLA_TARGET
:
export LIBTORCH_TARGET="cu116"
export ELIXIR_ERL_OPTIONS="+sssdio 128"
export XLA_TARGET="cuda118"
export EXLA_TARGET="cuda"
I’m not sure if the cuda116
is bad in there. I’m running cuda 11.8 with driver 520, and cudnn 8.7.0.84. cuda116 was the highest listed for Torchx. Not sure if that matters if I’m using EXLA, but I notice the following PyTorch parameters...
in the logs, so I assume PyTorch is still involved.
I’m trying to figure out what could be missing in the Elixir side to get it closer to the Python side.
I would also love to write up a blog post after I’ve been through this process of launching in production. The docs, as always with Elixir, have been super helpful and accessible for a ML newbie like me, but I have found them and the general blogosphere support still thin with info, which is to be expected at this early stage, but I saw there was a note to look at EXLA compiler flags, and defn opts, but there wasn’t any general advice about what’s a good starting place, and the EXLA flags were quite long, so I didn’t even know where to start.