When I run this block for the first time:
{:ok, model_info} = Bumblebee.load_model({:hf, "distilbert-base-uncased"}, architecture: :base)
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "distilbert-base-uncased"})
inputs = Bumblebee.apply_tokenizer(tokenizer, "This is a test")
Axon.predict(model_info.model, model_info.params, inputs).pooled_state
I get
Nx.Tensor<
f32[1][768]
EXLA.Backend<host:0, 0.3080042529.4070965263.103946>
[
[0.1084553599357605, 0.0384901687502861, 0.3454555869102478, 0.3366532623767853, 0.0, 0.0, 0.0, 0.0, 0.0, 0.24088211357593536, 0.2771815061569214, 0.31721076369285583, 0.0, 0.0, 0.0, 0.1535787135362625, 0.07263995707035065, 0.0, 0.2471829056739807, 0.24521753191947937, 0.038037270307540894, 0.0, 0.33873334527015686, 0.0, 0.08038126677274704, 0.05353151261806488, 0.1563723087310791, 0.0, 0.333274245262146, 0.0, 0.1448795050382614, 0.0, 0.13116799294948578, 0.4072483777999878, 0.0, 0.0, 0.3591342866420746, 0.0, 0.26890164613723755, 0.0, 0.0, 0.36578384041786194, 0.4254622161388397, 0.08779369294643402, 0.0, 0.2617552578449249, 0.0, 0.35009729862213135, 0.47235962748527527, 0.0, ...]
]
I assumed that if I ran the same block a second time, I would get the same embedding returned. Instead, re-running the exact same block — including the model and tokenizer loading lines — yields:
#Nx.Tensor<
f32[1][768]
EXLA.Backend<host:0, 0.3080042529.4070965263.104538>
[
[0.0, 0.0, 0.02861723303794861, 0.22635801136493683, 0.3732376992702484, 0.1636544167995453, 0.0, 0.5278993844985962, 0.32550013065338135, 0.0, 0.0, 0.008047151379287243, 0.0, 0.0, 0.18475914001464844, 0.6037576198577881, 0.0, 0.621748685836792, 0.0, 0.0, 0.3294147849082947, 0.0, 0.0, 0.12289489060640335, 0.0, 0.02070406638085842, 0.0, 0.21100129187107086, 0.24555926024913788, 0.0, 0.0, 0.0415930338203907, 0.4362318515777588, 0.0, 0.0, 0.0, 0.0, 0.0, 0.30100217461586, 0.1799420416355133, 0.4896824359893799, 3.664734831545502e-4, 0.0, 0.0, 0.15452004969120026, 0.0, 0.0, 0.0, 0.08496798574924469, 0.0, ...]
]
My questions
- Is there a stochastic aspect I am not controlling when re-loading the model and tokenizer?
- What aspect of model/tokenizer loading I am not controlling induces the variation?
- Is it possible to reload the same “version” (same “seeds”?) of a model over and over to ensure embeddings are consistent from one load to the other?
- Or do I have to control other stochastic aspects of the logic? Maybe in the pooling algorithm?
Some more things I tried
To no avail, I tried:
- Adding
mode: :inference
when runningAxon.predict(model_info.model, model_info.params, inputs, mode: :inference).pooled_state
to no avail => still getting different results on repeated runs. - Setting
:rand.seed(:default, {123, 456, 789})
before theAxon.predict
call (both with and withoutmode: :inference
In all cases, I am still getting different results on repeated runs.
Notes
In case that’s relevant, I am using:
config :nx, default_backend: EXLA.Backend
config :nx, :default_defn_options, compiler: EXLA, client: :host