Using Llama 2 With Bumblebee

djaouen · August 2, 2023, 12:26pm

I am trying to use Meta’s Llama 2 with Bumblebee, but I am getting a 401 error when I try to load it. I have been granted access to the repo on Hugging Face, but I think I need to provide an access token when loading the model, and I am not sure how to do so.

{:ok, model} = Bumblebee.load_model({:hf, "meta-llama/Llama-2-7b-chat-hf"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "meta-llama/Llama-2-7b-chat-hf"})

** (MatchError) no match of right hand side value: {:error, "HTTP request failed with status 401"}
    (stdlib 5.0.2) erl_eval.erl:498: :erl_eval.expr/6
    #cell:oqtu7kdr36by6ud4yug4rxbxs3qw544i:15: (file)

regex.sh · August 2, 2023, 12:35pm

I don’t think Bumblebee supports that for now?
Does it? @seanmor5

EDIT:
Correction, there seems to be auth_token but for cached downloads

github.com

elixir-nx/bumblebee/blob/main/lib/bumblebee/huggingface/hub.ex#L41


      
              and error otherwise
          
            * `:auth_token` - the token to use as HTTP bearer authorization
              for remote files
          
          """
          @spec cached_download(String.t(), keyword()) :: {:ok, String.t()} | {:error, String.t()}
          def cached_download(url, opts \\ []) do
            cache_dir = opts[:cache_dir] || Bumblebee.cache_dir()
            offline = opts[:offline] || bumblebee_offline?()
            auth_token = opts[:auth_token]
          
            dir = Path.join(cache_dir, "huggingface")
          
            File.mkdir_p!(dir)
          
            headers =
              if auth_token do
                [{"Authorization", "Bearer " <> auth_token}]
              else
                []

djaouen · August 2, 2023, 12:54pm

Thanks for looking into this for me. Is there a way to utilize this with load_model, load_tokenizer, and load_generation_config?

jonatanklosko · August 2, 2023, 1:06pm

Hey @djaouen, you can specify it in repository options: {:hf, "meta-llama/Llama-2-7b-chat-hf", auth_token: "..."} : )

djaouen · August 2, 2023, 1:31pm

Thanks, @regex.sh and @jonatanklosko!

haavars · August 30, 2023, 4:09pm

Did you ever get i to work?

djaouen · August 30, 2023, 4:37pm

Not yet. I was waiting for Bumblebee to get upgraded so that I could use Bumblebee.Text.Llama.

drewble · September 6, 2023, 4:58pm

Looks like v0.3.1 has the official support for Bumblebee.Text.Llama. Is that not working for you?

djaouen · September 6, 2023, 6:53pm

Actually, I don’t know what version of Bumblebee I was using, as it’s been long enough that Hugging Face deleted my notebook. But I will certainly look into that if I build out a similar project in the future. Thanks for the info!

nutheory · September 7, 2023, 3:21am

I was able to get “NousResearch/Llama-2-7b-hf” working using 0.3.1. Does anyone know if we can use GGML locally so i don’t have to wait for 5minute inference for two sentences on my M2?

josevalim · October 3, 2023, 6:26am

You can go ahead and do the bindings for llama.cpp yourself. Check out Erlang NIFs.

Otherwise, docs for LLaMA have been added here: Llama — Bumblebee v0.4.2

slashmili · March 23, 2024, 12:47pm

In case someone still needs Llama doc, it’s not available on the latest doc and can be found here https://hexdocs.pm/bumblebee/0.4.2/llama.html

josevalim · March 23, 2024, 6:25pm

Good call, I believe the updated version is here