Lorax - Finetune and run LLMs with LoRA

Hello!

I’ve been experimenting to see if LoRA could be implemented in Axon and finally got it to work.
For those unfamiliar, LoRA (Low-Rank Adaptation of Large Language Models) is probably one of the most popular techniques out there for creating custom models. The main benefit is that it significantly reduces the computational requirements needed for fine-tuning. As a result, people with consumer GPUs can experiment with the latest models.

I also didn’t like having to jump back and forth between Python and Elixir to do ML, so I created this library.

How it works

Here’s how you add LoRA layers:

{:ok, model_info} = Bumblebee.load_model({:hf, "gpt2"})

lora_model =
  model_info.model
  |> Axon.freeze()
  |> Lorax.inject(%Lorax.Config{
      r: 2,
      alpha: 4,
      dropout: 0.05,
      target_query: true,
      target_key: true,
      target_value: true
  })

# your training code
merged_params = ...

Likewise you can save the LoRA parameters (only a couple mb) to your computer and run them later.

lora_params = merged_params
  |> Lorax.Params.filter(gpt2_params)
  |> Lorax.Params.kino_download()

More detailed guides can be found here

Demo

Here’s something I call Elixir Thread Simulator. It’s running GPT2 but with a LoRA adapter. You can type in any thread title (provided they’re sandwiched between <title>...</title>) and it’ll generate a fake thread similar to the ones in Chat/Discussion.

# <title>Elixir 2.0 is released! New features include</title>
<author>xjalasz</author>

Elixir 2.0 is released! New features include
This means that you can now deploy your Elixir 2 projects without having to use a tool like docker.

<likes>1 like</likes>

<author>jake</author>

As always, thanks for the help.
I am on a 10 day cruise to Toronto in July with the goal of finishing up Elixir 2.0 in less than one week.

<likes>1 like</likes>

<author>pianotato</author>

Thanks for all the support!

I trained the LoRA adapter for about an hour or two, so it sometimes produces incoherent text. It may also take a couple of secs to respond if multiple people are using it.

Repo

You can check it out here

There’s still a lot to improve regarding training speed, so this library is just proof of concept at the moment. There’s also a lot of other LoRA variants / fine-tuning methods out there, so I may add them if they catch my eye. Let me know what you think, feedback is welcome.

Cheers,
Ted

11 Likes

v0.2.0:

Loading generic LoRAs for Stable Diffusion should be similar to the work needed for LCM LoRA, but involves adapting the text encoder. Maybe I’ll do this for 0.3.0 :thinking:

2 Likes

I just found this library. It’s super cool. Really appreciate the hardwork put into it. This is really important work for the elixir ecosystem to be able to have scalable fine-tuning for LLMs.

I was wondering if you have plans on working on something like the ability to use adapters.

In https://pbase.ai they have a library called lorax (which was how I found this library) but they also handle using multiple finetuned models in a single GPU.

1 Like

Thanks! Appreciate it :smiley:

Yeah I found out about predibase’s lorax shortly after creating this library haha. Creating something similar for Nx / Axon would be tricky, especially with regards to JIT-compiling the model.

Now that I think about it, I think it’s possible to serve hot-swappable LoRAs. The model expression would need blank slots for LoRAs all the time, and due to the static-y nature of Nx, the adapters would need similar configurations for the A and B matrices.

Anyways, I think it’s doable for certain usecases, but unfortunately I’m wrapped up in other projects, so I don’t have time to investigate further.

Hope that answers your question!

Thank you for answering. I learned something new. Really appreciate it!

1 Like