LLM Supremacy for Elixir?

AndyL · October 29, 2024, 5:17pm

Would it be possible to train an LLM to be very proficient in Elixir? If so, what approach would be used? Could it be done by a single developer, or would it require the resources of an AI Hegemon?

Likewise: how could a LLM become very proficient across the packages in Hex.pm, Hexdocs, Erlang Docs?

Are there practices that developers could adopt that would make it easier for a package to be ingested into an LLM?

Eiji · October 29, 2024, 7:26pm

Even if it would “understand” basics like types I’m worried that it would still produce a lot of wrong answers.

How would you train such LLM? The options are far too limited …

First of all you would train using forum posts then you would limit to Elixir’s core team members who know Elixir in detail. I’m one of the most active forum members, but I would not fully trust my post - not because they were wrong, but simply because I became more experienced, learned new things and changed opinions. Think about work with younger you. You would have to remind all your motivations back and also keep in mind that some of your answers may be outdated.
Learning resources - while we as humans have more than enough books to learn Elixir - the LLMs are working on huge amount of data - “one or two blogs” more would not change a lot
Other sites like StackOverflow - with all its positive and negative consequences
Company-only resources - as working in each team is always a bit different in both organisation and code

Therefore if we like to create a very high quality LLM for Elixir then we would need an LLM train configuration tool which would allow to specify resources, limit them by authors and rate by time. This would not only be hard to achieve, but also take a lot of time for each it’s user. This we’ve got if we rush too far and try to use LLM as AI.

There are already many interesting code editor integrations and LLM may be hood for a tasks like:

Find resources about … (not much to do here - just list of links and their rates)
Remind me what’s the name for a … which does …
Find all hex packages that could be categorised as …
Smart code snippet fetcher (based on natural language search)

1-3 are easy to validate as you are searching for a specific thing and as long as you would keep your mind sharp there would be no problems. To generate a good code snippets we would need some trusted source like: exercism.org

I don’t believe that LLMs can become something better and especially a programming partners. This would not only require a very good AI, but also resign from a lot of privacy as the AI would be best if it would learn on your daily work even on private pet projects. That would be rather more rewarding for experienced developers especially those with 10+ years of experience. Alternatively the LLMs and AIs can learn following our bookmarks and likes.

I don’t think that a single person could do it. If you want to do it well then you would need a good team to think about many cases. Look how much I said “just as a comment” for this post. I believe that if people would take more time to think about tit then my post would be like a drop in ocean …

Also there is no rule to cover all cases. Similarly all LLMs and AIs would most probably be best when trained on specialised tasks rather than for a general usage. A generic one would be good for a “common knowledge” and could use the wikis that are shared by communities. That’s said I would rather expect something like a “generic specialised AIs” like a Gaming AI which helps in various games, but have no idea about movies and books as long as they are not related to the game specified by user.

dtew · October 29, 2024, 11:10pm

I can only tell you about what I personally do. First some context: my experience in Elixir is that of a novice; my experience as a (non software) engineer is livelong; my motivation is the drive, to produce internal solutions for my small organisation - 200 employees.

I pay for both Claude and ChatGPT. In Claude projects feature I will upload the Hexdocs documentation for those packages that fit in (unfortunately only smaller ones - but there are many of those). As an example: I use Elixir Explorer a lot, and this documentation is small enough to fit in Claude project knowledge. I download the documentation from Hexdocs as an epub. Then I convert that to either a word (if I need to editi it if it’s too big) or pdf or even txt. For the conversion I use calibre 4.23 (not any newer) with addition of plugins

I cannot judge if the applications created are coded poorly or well. I can only say that they are elaborate applications in what they achieve for our organisation. I am the one that says ‘I need it to do this or that’ and the LLMs are the ones that code it. I am also the one that says “this is rubbish - let’s abandon this” and the they usually encourage me not to give up - that’s funny. The most amusing time was when I scolded Claude and it totally rewrote the module - about 400 loc - in a different manner and it worked immediately.

So, I use the phrase ‘critique this code’ quite a lot in my prompts, especially when I get one LLM to think through an issue and then paste their response to the other LLM with the critique this code directive. Often, going back and forth not only do I learn what I am doing and what I want but also their solution seems more sophisticated.

I also access the LLMs via API for the applications themselves, but that is different to the use case of the question.

I develop mostly in LiveBook runtime attached to a phoenix app (which itself is hosted on a local macbook pro using cloudflare); I move the LiveBook code to Phoenix sometimes when they are mature or generically used across many liveviews e.g. to read google drive files or to write Explorer DataFrames to Google Spreadsheets, just for some examples.

About 5000 loc in production so far in this manner.

raulrpearson · October 31, 2024, 9:56am

I’m going through Machine Learning with Elixir and also wondering if an LLM trained to excel at Elixir could be a good exercise.

My (limited) understanding is that this could take the form of transfer learning/fine-tuning a general LLM (maybe one specifically trained for coding) with a bunch of good Elixir codebases (just take the top packages from hex.pm but also maybe that requires checking terms and licenses?):

Fine-Tuning Your Model

In the previous section, you made use of a pre-trained model for feature extraction. You attached a classification head on top of a frozen pre-trained model— taking advantage of the general features extracted from the pre-trained model for your specific problem. Remember, freezing the model initially was important because the early stages of training are unstable and your model was at risk of losing all of its prior knowledge. But now that you have a trained model, you can unfreeze some of the layers of the pre-trained model and force them to learn features specific to your problem. This process is called fine-tuning.

I think RAG could also be part of it. Maybe pre-processing docs for the top libraries and building embeddings for them and maybe can also be done at runtime with any missing libraries. Maybe Hexdocs could actually host the canonical database of docs embeddings to be used by LLMs?

In terms of LLMs ingesting packages, I think Elixir is in a very good spot. It’s likely not that difficult to use ExDoc to extract a clean version of the docs (maybe easier for the LLM to consume markdown than HTML, for example, less tokens at least).

One thought that I also had is whether anyone is training LLMs on the actual ASTs from codebases rather than the characters themselves. So using the language syntax as a vocabulary, tokenizing programs with this vocabulary instead of the statistically derived groups of characters. Add something like instructor at inference (but with ASTs instead of JSON) to guarantee that you’re only generating valid Elixir programs and maybe quality and consistency of generations improve? The problem with this approach, though, might be that this could only be used to train from scratch because fine-tuning might require using the same tokenizer as the one in the original training.