Current status of LLMs writing Elixir code

Hi folks,

I don’t use LLM tools often to help write code, mostly things like asking Perplexity to write me a quick Bash script so I don’t have to remember all the quirks about Bash syntax. For those of you that have, what’s the current status of having AI help you write Elixir code? I suspect that since there was a lot less training data for Elixir than say C, Java, or Python, the LLMs will be less helpful.

3 Likes

All of the leading LLMs do quite well with elixir

2 Likes

I’m finding that they work really quite well. Both for the language and common libraries as well.

The best success I’ve had is with Claude Sonnet 3.5, but I’ve not given many of the others much chance (to be fair)… so some of my judgement may simply be that I’m pretty comfortable now with the interaction with that model.

Interestingly, I’ve used the Sourceforge Cody extension for both VS Code and Intellij and now the Cusrsor VS Code fork. I find that I get significantly better results using Cursor than Cody even though in both cases I’m using Claude Sonnet 3.5. At least in the case of editor/IDE integrated solutions, it seems the extension integration is significant to the quality of results; this isn’t surprising, but with all the talk of which model is best for a given purpose, it can be easy to overlook how you’re interfacing with the LLM.

2 Likes

It would be very interesting to know what is the use-case that gets most mileage from using LLMs, as I am very skeptic of its usefulness besides entry-level code examples.

1 Like

I’m finding LLMs pretty useful across a broad array of tasks. The TL; DR is that it works very much like a junior developer that I pair program with. There are mistakes (conceptually and in details) and I’m not saying “go write an accounting system with inventory control”… I’m very much taking things function-by-function… but it’s useful.

For me, LLMs come to mind/into play when I’m doing the following:

  • Autocomplete while writing code. While I’m not sure it’s the most important thing I use the LLM for, it’s definitely the most common interaction for me because it happens without my stopping to make it happen. Not unlike standard autocomplete anticipating the next variable or keyword you’re starting to type, the LLM is anticipating the expression or statement (or even block) that you’re going to write. Results can vary here quite a bit and this is probably an area where the integration tool has the biggest influence; Cody does a meh job with this task and Cursor does a very good job with it (both Claude Sonnet 3.5). When this works well, boilerplate-like blocks just appear and you can accept them and move onto the next. Context available to the LLM here is clearly key. Often times I can accept the autocomplete suggestion without changes, there’s a decent amount of suggestions that can be accepted with minor modifications, and some of which just aren’t right. Interactivity with the autocomplete functionality varies from tool to tool as well, but most will show you the suggestion and then you can use keystrokes to accept/ignore it/retry it. Cursor will also allow you to incrementally accept the suggestion (crtl + arrow I think) word by word.

  • Initial Documentation. I’m finding it useful too, once I have an API that should have ExDoc strings written, I let the LLM write the initial documentation. It does a pretty good job of getting things like parameters and return values at least represented and it also will often times include even examples. It’s terse and sounds like it was written by a marketing department minion, but it gets a lot of the form right. Again, it does well with boilerplate-like bits like assigning sections if there’s enough context from other parts of the code to set the example. I do go back and clarify or re-write portions, but its better than just starting from scratch. I also find it less mentally taxing to act in the editor/reviewer capacity than in the author capacity.

  • Writing Tests. Recently I had to write some tests for some of my application components which were created before my testing strategy was fully thought out. For each new test, I just told the LLM that I needed a test and gave it the file and function name being tested. This worked well and was more thorough than I might have been in some cases. There were a fair number of times where I had to tweak the tests to be correct, but it still was a significant time saver. Again, context availability was key here and if I had similar tests already existing, the output quality would go up as the generated test incorporated norms within the testing corpus.

Other less frequent uses are:

  • Supplying expertise I lack. Some of my recent tests were dealing with network addresses and related bit twiddling… I’m much more facile with financial and accounting operations than I am with bitwise operations. The LLM was able to correctly do the bit manipulation and evaluation; all I had to do was validate that it was correct. This also manifests in being able to recall the APIs of common libraries across a broader range of topics than I’m usually commonly working with.

  • Interpreting Difficult to Read Code. A colleague tried to use an LLM to convert a MSSQL stored procedure into a PostgreSQL function. The LLM failed and they eventually called me in. I do understand the PostgreSQL just fine, but I’ve never worked with TSQL and the original code given to me was “a little obscure”… and they couldn’t tell me even anything about the workings of the code. After a couple of “close but not quite attempts” on my part… I finally broke down and just asked the LLM to tell me in plain English what the original code did. And it did so perfectly and understandably. I could see the places where I misunderstood or let confirmation bias cloud my view and I was able to immediately produce correct and simpler/saner code in PostgreSQL. (That was my first LLM experience and moved me from skeptical to enthusiastic).

  • Writing Shell Scripts. The last time I looked at the bash man page I think I saw that the Marquis de Sade was in the author’s list. I hate shell scripts with a passion: I write them rarely and typically need a drink immediately after writing one. These tend to be small programs well within the scope of a decent LLM, and so far the experience of letting the LLM deal with this when needed as increased my personal joy a lot.

  • Writing regular expressions. See “Writing Shell Scripts”.

What I do not use the LLM for:

  • Search. When I’m searching, I want something much more mechanical than an LLM is designed to produce. LLMs could excel doing contextual searches with inferred matching, but they end up failing in completeness or end up including mistakes (or making stuff up). They designed to produce credible language like a human might produce in a similar situation… faults and all… not mechanically testing for thoroughness or even correctness. Outside of basic “does this thing exist” kind of questions, I’d tend to avoid it for this case and even then I tend not to trust it.

  • Brainstorming. I’d think it would be useful in a case like this, but too often I’m just getting a conventional wisdom that I’m often times already aware of. This isn’t surprising.
    There exist parameters to tweak how “far out” the models can stray from the most common kind of response… but that’s more time investment than I’m willing to take so I don’t go there.

Anyway… you asked :slight_smile:

13 Likes

I use Github Copilot and it handles autocompleting Elixir without a problem. When it first came out I found I’d sometimes get more ruby-ish suggestions but now I’m consistently seeing Elixir completions.

I also use Aider with Claude Sonnet 3.5 and had great results.

Sounds pretty useful, this might motivate me to actually give it a try.

In terms of privacy, do these tools need access to the entire project? This is a big no-go for me, unless we talk about a self-deployed solution.

I’m only using Copilot from Github, but I found that it writes worse Elixir than it writes JavaScript or Python.

In particular, it does often make mistakes in syntax, like missing } or ) and it, surprisingly, doesn’t do that as much in JavaScript.

It also hallucinates on occasion, by suggesting function calls to modules from stdlib that are not there, or are in a different variant or different module (like Enum vs List).

It is decent but other langues are doing better here.

Edit: I also use ChatGPT and Gemini sometimes for code generation but way more Copilot as it’s built into editor.

1 Like

Posting here my slightly angle for potential newcomers to Elixir like myself.

Context
I’ve just started to pick up Elixir (had a go with Dave Thomas Elixir for Programmers a couple of years ago, loved it but stopped half way because life and work (Python) got in the way) for the second time.
This time is more serious as I quit my job and have full on time to go at it if I’d like.
I’m trying to create my own internet products (nearly all in Python) and have deeply dived into using LLMs (via Cursor) in pretty much all of the ways described before - to great benefit, speed and boilerplate-skipping.
More specifically, I like the autocomplete which gets me 90% of the way there and I’m immediately able to assess whether the code will behave as intended or is structured in my intended way.

Opinion for learners

  • For learning specifically I’m choosing to not let AI write any code for me (not even tests) for the time being (for as long as I can, hopefully the first year at least).
  • This is because I want to ingrain in my muscle and visual memory the language, libraries, idioms and just feeling it - I feel like I’d be missing out on some good fun/dopamine if I just let AI do it all. After all I’m going into Elixir with the expectation that it becomes my new favourite language - so I want to maximize the pleasure of writing the actual code.
  • For documentation, interpreting and explaining new code and as @sbuttgereit does for delegating away ancillary tools/code (read bash, deployment/configuration tooling) that you’d rather not deal with it - I’m generally all in.

Hope this is somewhat useful :slight_smile:

3 Likes

Wise choice, you have my respect for it. You seem like one of those Elixir beginners that I had a blast tutoring some 4-5 years ago – eager, willing to learn, not lazy, and willing to put in the work to get the right habits ingrained in the brain. Bravo.

As you yourself noticed, it’s important to learn which is which, and why. The current breed of what people call “AI” should be limited to boilerplate generation for the moment, because in the observations of my colleagues and acquaintances they are not good for much else. And the nature of “AI” does not help with this because statistically the bigger dataset (JS, Python) yields better results and frak everyone else I guess – solid philosophy :003: but I expect nothing more than VC-funded companies so it’s all par the course.

Elixir has high chances of becoming your new favorite language. Immutability and generally the isolation of side effects help with developer experience much more than JS / Python programmers realize (they only understand it after the fact). Throw iex and ex_unit in the mix and you have an awesome success recipe.

Shout out if you need help here on the forum.

3 Likes

For the two that I work with, the short answer is “yes” they will look at the entire project.

This is more about the tool integrating with the LLM rather than the LLM itself. In both cases, part of the quality they can achieve is that they can look across the project in order to build context for constructing their LLM prompts and the giving the LLM enough context to answer. This makes sense, think how well you can code if you only think about the immediate file or function that you’re working with. This isn’t to say that your privacy concern isn’t valid, but that there’s a natural conflict between the context required for better results and privacy.

Naturally, the privacy question is a pretty hot topic overall because your concern is pretty common and many aren’t sure where to draw the lines because these tools are more effective with more knowledge of your codebase. I’m pretty sure I’m going to be switching my work to “Cursor” and their privacy policy is here:

The one I’m switching from, Sourcegraph Cody, doesn’t really have a good place to get a clear privacy statement, but this discussion in their forums is useful:

Given the complexity of what “privacy” even means with these tools, it’s something you really need to look at to see if they cross any red lines for you.

2 Likes

Yeah, I agree with much (all?) of what you say here and would go further in that one of the big concerns for me is that even experienced developers’ knowledge/ability can atrophy without regular exercise. While I find that I get to my larger goals faster with the LLM, in part that’s because I don’t have to work or think as hard… but I am losing that regular exercise and expect that writing without the LLM will result in my being slower than I was before just because I’d have to rebuild that knowledge of language minutiae to be as effective as I was pre-LLM.

I think what it means to be a developer will be an interesting topic over the coming years as these tools and workflows improve. How do juniors obtain experience and will the art become more and more about extracting a given result from the tool rather than not needing the tools at all? Are there analogies from other industries about how, for example, we moved from designing things like airplanes very manually with actual pen and paper drawings and slide rules to using advanced CAD systems which can even simulate the complex aerodynamics of the system? Surely, we lost skills when these very manual tasks of drafting and calculation were taken on by the computer, but we also gained a lot too. (And to be clear, I’m think in terms of professional software development, not recreational software development where a more “artisanal” approach to coding can very well be more desirable.)

Anyway, interesting times ahead and it will be interesting to watch as it evolves. In the meantime, I’ll continue to use the LLM because it allows me a real shot of completing a project that otherwise is too foolishly large for an indy developer like me to complete without such aids.

5 Likes

There’s a balance. We should not do annoying stuff 100% of the time just so we don’t lose shape or whatever other nebulous goal that is, but we also shouldn’t strive to become nepotism-hired orchestra conductors who don’t even know how to actually code stuff manually anymore.

I for one am looking into – and working on – boilerplate generation without using “AI” but I also don’t want to try and outsource f.ex. a complex Ecto / SQL query with recursive CTEs or window functions or lateral joins etc.

My view is: humans work on the logic and the coherence of the whole thing, machines write the annoying manual parts of the code, f.ex. Ecto.Schema definitions and migrations are theoretically able to be generated from a singular source of truth that describes the names, the types and various constraints (written in some mythical DSL that either does not exist or I don’t know about it). So yeah, manually writing Ecto schemas and migrations gets old and annoying really fast and the Ecto’s CLI generator leaves stuff to be desired, sadly (two random examples: integer ranges, ability to generate DB constraints and not only Ecto validations).

In this example: I want to be able to describe types of data and leave the details to the machine. The typing DSL should be able to generate code in most programming languages as well and I am actually disappointed that nobody ever did this well.

2 Likes

:eyes: :thinking:

2 Likes

Hm? Roast me, I won’t mind it, not like I stay on top of the entire IT area 24/7.

But again, I am looking for something that can be used to generate code for many different programming languages AND different ORM / DataMapper frameworks inside them.

Ah, sorry I was not trying to roast you I promise :laughing:. I didn’t mean for that to sound rude. Only that this is the essentially the foundation of what I’m trying to do.

Ash doesn’t come with any cross-language code generation tools, but it is the general programming pattern with Ash. You describe your domain using data structures (i.e Ash.Resource). It acts simultaneously as a tool to interact with your domain, and the source of truth for anything you project from it.

We generate migrations, for example, automatically. You can see for yourself:

Generate a scaffold of an app

mix archive.install hex igniter_new

mix igniter.new your_app --install ash,ash_postgres

cd your_app

mix ash.gen.resource YourApp.Accounts.User \
  --uuid-v7-primary-key id \
  --attribute username:string:required \
  --default-actions create,read,update,destroy \
  --timestamps \
  --extend postgres

mix ash.gen.resource YourApp.Twitter.Tweet \
  --uuid-v7-primary-key id \
  --attribute text:string:required \
  --relationship belongs_to:author:YourApp.Accounts.User \
  --timestamps \
  --extend postgres

Run codegen

Codegen by default only will generate the required migrations, but any extension can tap into the codegen step to add its own behavior.

mix ash.codegen initial_setup

That command generates:

defmodule YourApp.Repo.Migrations.InitialSetup do
  @moduledoc """
  Updates resources based on their most recent snapshots.

  This file was autogenerated with `mix ash_postgres.generate_migrations`
  """

  use Ecto.Migration

  def up do
    create table(:users, primary_key: false) do
      add(:id, :uuid, null: false, default: fragment("uuid_generate_v7()"), primary_key: true)
      add(:username, :text, null: false)

      add(:inserted_at, :utc_datetime_usec,
        null: false,
        default: fragment("(now() AT TIME ZONE 'utc')")
      )

      add(:updated_at, :utc_datetime_usec,
        null: false,
        default: fragment("(now() AT TIME ZONE 'utc')")
      )
    end

    create table(:tweets, primary_key: false) do
      add(:id, :uuid, null: false, default: fragment("uuid_generate_v7()"), primary_key: true)
      add(:text, :text, null: false)

      add(:inserted_at, :utc_datetime_usec,
        null: false,
        default: fragment("(now() AT TIME ZONE 'utc')")
      )

      add(:updated_at, :utc_datetime_usec,
        null: false,
        default: fragment("(now() AT TIME ZONE 'utc')")
      )

      add(
        :author_id,
        references(:users,
          column: :id,
          name: "tweets_author_id_fkey",
          type: :uuid,
          prefix: "public"
        )
      )
    end
  end

  def down do
    drop(constraint(:tweets, "tweets_author_id_fkey"))

    drop(table(:tweets))

    drop(table(:users))
  end
end
2 Likes

That looks like a solid start, thank you.

What about the two examples I gave you? (1) integer ranges and (2) optional toggle for any of the validations to also introduce a DB constraint?

…Oh, and while we’re at it: (3) string regex validations, (4) string / enum / integer inclusion validations (f.ex. color in the DB can only be “red”, “green” or “blue”)?

That is on the roadmap, but not implemented currently. You can do them both in the DSL though.

attribute :score, :integer do
  constraints min: 1, max: 100 # not lowered to DB, but can easily be in the future if desired
  allow_nil? false
end

and

postgres do
  check_constraints do
    check_constraint :price, "price_must_be_positive", check: "price > 0", message: "price must be positive"
  end
end

The resource is the source of truth, and if the check constraint in the resource changes, migrations will be generated to reflect the change. So a tool that translates constraints to check constraints defined on the resource would be all that is necessary. We have an entire system for this using transformers.

Same goes for your “while we’re at it” steps.

attribute :email, :string, constraints: [match: ~r/.*@.*/] etc.

We’re just a stones throw away from lowering any of that to the db. Regexes are harder, may require manual translation for now.

2 Likes

In the recent past, I have mostly been working with Claude-3.5 Sonnet w Projects (and before that Claude 3 Opus). In between, I worked a bit with GPT-4o, but shifted to Claude 3.5 when that came out. I’ve heard good things about o1-mini, but only had one (really impressive debug) interaction with it. I plan on working with it exclusively for the next few months if it turns out to be Claude 3.5 level or better. Both Claude 3 Opus and Claude 3.5 Sonnet produce good elixir code.

I have tried Cursor / Continue etc. I was inspired enough by the potential of these projects that I contributed PRs to a couple of them (Rubberduck and Continue). Eventually I figured out that they are not for me.

My personal approach to working with LLMs is the same for coding as for writing. I chat with it, get an initial idea, usually convince it there’s a better way, work with it’s mods, tweak the code a lot, and finally paste it into the text editor in the IDE. The final code is as much my doing (or more) than the LLMs. It’s still super productive though.

To streamline this workflow, I’ve recently released a tool designed to facilitate this kind of interaction with Claude projects and GPTs directly in the chat window. It’s an open-source project that I’m continually refining based on real-world usage and feedback.

For anyone interested in exploring this approach:

1 Like

Wow, thanks to everyone in this thread! There’s so much great information here!

1 Like