Using Llama 2 With Bumblebee

I am trying to use Meta’s Llama 2 with Bumblebee, but I am getting a 401 error when I try to load it. I have been granted access to the repo on Hugging Face, but I think I need to provide an access token when loading the model, and I am not sure how to do so.

{:ok, model} = Bumblebee.load_model({:hf, "meta-llama/Llama-2-7b-chat-hf"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "meta-llama/Llama-2-7b-chat-hf"})
** (MatchError) no match of right hand side value: {:error, "HTTP request failed with status 401"}
    (stdlib 5.0.2) erl_eval.erl:498: :erl_eval.expr/6
    #cell:oqtu7kdr36by6ud4yug4rxbxs3qw544i:15: (file)

I don’t think Bumblebee supports that for now?
Does it? @seanmor5

Correction, there seems to be auth_token but for cached downloads

Thanks for looking into this for me. Is there a way to utilize this with load_model, load_tokenizer, and load_generation_config?

Hey @djaouen, you can specify it in repository options: {:hf, "meta-llama/Llama-2-7b-chat-hf", auth_token: "..."} : )


Thanks, and @jonatanklosko!

1 Like

Did you ever get i to work?

Not yet. I was waiting for Bumblebee to get upgraded so that I could use Bumblebee.Text.Llama.

1 Like

Looks like v0.3.1 has the official support for Bumblebee.Text.Llama. Is that not working for you?

1 Like

Actually, I don’t know what version of Bumblebee I was using, as it’s been long enough that Hugging Face deleted my notebook. But I will certainly look into that if I build out a similar project in the future. Thanks for the info!

I was able to get “NousResearch/Llama-2-7b-hf” working using 0.3.1. Does anyone know if we can use GGML locally so i don’t have to wait for 5minute inference for two sentences on my M2?

1 Like