Lorax - Finetune and run LLMs with LoRA

Hello!

I’ve been experimenting to see if LoRA could be implemented in Axon and finally got it to work.
For those unfamiliar, LoRA (Low-Rank Adaptation of Large Language Models) is probably one of the most popular techniques out there for creating custom models. The main benefit is that it significantly reduces the computational requirements needed for fine-tuning. As a result, people with consumer GPUs can experiment with the latest models.

I also didn’t like having to jump back and forth between Python and Elixir to do ML, so I created this library.

How it works

Here’s how you add LoRA layers:

{:ok, model_info} = Bumblebee.load_model({:hf, "gpt2"})

lora_model =
  model_info.model
  |> Axon.freeze()
  |> Lorax.inject(%Lorax.Config{
      r: 2,
      alpha: 4,
      dropout: 0.05,
      target_query: true,
      target_key: true,
      target_value: true
  })

# your training code
merged_params = ...

Likewise you can save the LoRA parameters (only a couple mb) to your computer and run them later.

lora_params = merged_params
  |> Lorax.Params.filter(gpt2_params)
  |> Lorax.Params.kino_download()

More detailed guides can be found here

Demo

Here’s something I call Elixir Thread Simulator. It’s running GPT2 but with a LoRA adapter. You can type in any thread title (provided they’re sandwiched between <title>...</title>) and it’ll generate a fake thread similar to the ones in Chat/Discussion.

# <title>Elixir 2.0 is released! New features include</title>
<author>xjalasz</author>

Elixir 2.0 is released! New features include
This means that you can now deploy your Elixir 2 projects without having to use a tool like docker.

<likes>1 like</likes>

<author>jake</author>

As always, thanks for the help.
I am on a 10 day cruise to Toronto in July with the goal of finishing up Elixir 2.0 in less than one week.

<likes>1 like</likes>

<author>pianotato</author>

Thanks for all the support!

I trained the LoRA adapter for about an hour or two, so it sometimes produces incoherent text. It may also take a couple of secs to respond if multiple people are using it.

Repo

You can check it out here

There’s still a lot to improve regarding training speed, so this library is just proof of concept at the moment. There’s also a lot of other LoRA variants / fine-tuning methods out there, so I may add them if they catch my eye. Let me know what you think, feedback is welcome.

Cheers,
Ted

6 Likes

v0.2.0:

Loading generic LoRAs for Stable Diffusion should be similar to the work needed for LCM LoRA, but involves adapting the text encoder. Maybe I’ll do this for 0.3.0 :thinking:

2 Likes