It never finishes unfortunately the memory usage is bouncing a lot too while tis happens.
Does somebody know why this happens and if this model can be used with Bumblebee?
Assuming you are using EXLA.Backend, it should finish (it does for me).
But tl;dr this model is not supported.
This particular case is a bit awkward, because the model has a custom implementation overrides on HF hub, which Bumblebee does not support, however the config says it’s XLMRobertaModel, which Bumblebee does support. So what happens is that Bumblebee builds a XLMRobertaModel model and tries to load the params, but they don’t match, so it initializes the missing parameters (in this case most of them), and that takes a bit. If you run this until finish, you will see a log saying [debug] the following parameters were missing with a long list.
It is also challenging to support properly, because the repository/implementation is a bit hacky. Usually model implementations end up in the transformers Python library, under a specific class name, and we do a corresponding implementation in Bumblebee. For Jina models however, the authors keep a slightly changed implementations on the Hub, and in some cases, the implementation even differs across model checkpoints. More details in Add JinaBert model by joelpaulkoch · Pull Request #407 · elixir-nx/bumblebee · GitHub
Note that Jina embeddings v2 and Jina embeddings v3 are different regarding the implementation but also regarding licenses. V2 is available under apache 2.0 license, v3 under Creative Commons Attribution Non Commercial 4.0 license.
So I don’t think we will see the v3 implementation in transfomers or bumblebee.