Here's how I'm coding Elixir with AI. Results are mixed, mostly positive. How about you?

First, the downsides. I periodically fire the AI. :smiling_face_with_tear:

^^ This was earlier today in Cursor AI, working with “Claude 4 Sonnet / Thinking”.

I bounce back and forth between Claude Code (w/ Claude 4 Opus) and Cursor AI (w/ Claude 4 Sonnet Thinking and ChatGPT o3). I find that each tool has its own ‘secret sauce’ of patterns for AI assistance that come through, even when using the same model with each. (I.e., w/ Claude 4 Opus in Cursor, it acts differently than Claude Code.)

My Claude.md File

This is what Claude Code writes and then reads. Here’s how I edited the start:


CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Engineering Principles we Follow

  • Security first. If we detect a security issue like mismatching user_id and customer_id, fail loudly.
  • Never, ever write demo code. Instead, we write a test that fails for a new feature or function. Then we implement the feature or function to production quality.
  • Fail fast
  • Use compile-time checking as much as possible.
  • No silent failures (So, e.g., don’t do this plan = Map.get(plan_map, frequency, "starter")
    instead fail loudly with an ‘unknown frequency’ error.
  • Make illegal states unrepresentable.
  • Lazyness is a virtue.
  • Every feature idea must overcome the YAGNI argument.
  • TDD

Definition of Done

A set of todos is only complete when ALL of the following conditions are met:

  1. All tests pass (mix test)
  2. Dialyzer passes (mix dialyzer)
  3. Credo passes (mix credo)

^^ That all works…most of the time. :sweat_smile: In Cursor, I have similar configuration in “User Rules”.

Now, my favorite hack that helps pair programming with AI: in every project, I create a /docs directory in the root. I have the AI assistants create documentation there, plans, and to do lists. Then I set it up as an Obsidian Vault. That’s pretty lightweight - Obsidian is basically a fancy markdown reader and editor. So in a nutshell, I “collaborate” with the AI this way.

I asked it to document what it learned — both for future AI sessions to refer back to, and for me. And then, of course, it’s very easy for me to edit and search:

I still have frequent very frustrating episodes: Claude Code deciding that a certain number of Credo warnings is acceptable. Or, the test failures are ok because it’s “demo code”. (What?? Who told you to write “demo code”?)

And I frequently need to steer the assistants in the right direction. The feeling I get is like working with a junior/mid-level programmer—but who never gets defensive and is happy to rewrite their code when asked. I do often have to do massive refactors and cleanups. But the AI did get me started, and got some kind of working code shipped.

So even though Elixir is more niche than Javascript or Python, I’m getting value from AI. When we’re really “clicking” well, I come back to my desk and see Claude Code’s latest message to me:

7 Likes

My Claude Code has tried to deploy to Fly.io without asking. But it didn’t use the “mix deploy” alias that would have actually worked.

And related to silent failures with your Map.get/3 example, it really likes to do defensive coding. As in it wants to check for both atom and string values.

I don’t think comparisons to junior/mid/high-level developers are useful. It’s all three at once. As in you can discuss design at the architectural level.

4 Likes

True about being all three at once. I never have to define any words, in any language.

Claude doesn’t really know how to write OTP code by default. And I’m not a senior OTP dev by any means, not even a junior one yet. But, I have studied OTP, and with my limited background, it’s obvious that Claude really prefers (as of Sonnet and Opus 4) to write crappy OTP, when unrestricted.

Without explicit direction, Claude will sneak in brittle tests with concurrency issues, unsupervised processes, overuse of potentially runaway atoms, and all sorts of other beginner-level gotchas.

I’m working on enhancing credo checks to help enforce OTP compliance. You want to spawn? Is it supervised? Robust specification for OTP compliance are a must, for every context. The issues I’ve faced are also related to not having explicit definitions of the OTP design in-context. Going back to having the design worked out in advance, with the supervision tree worked out, clearly specified in the prompts, has been working better for me in testing. Going to try this approach again on the next go-around for my OTP app. Vibe coding for these scenarios isn’t really feasible at this time, except to prototype, sketch, or learn (how not to do things).

As for prompting, the best success I’ve had is to have Claude break down a complicated set of technical documents into self-contained prompts that, for each, assume no previous context, and include all required reading. Even then, I have to review the prompt file to ensure that all of the refs are self contained. Then it’s just a matter of ensuring all dialyzer/warnings/failures/credo issues are resolved before moving onto the next prompt. But, still, Claude will take shortcuts from time to time, so a skilled human reviewer ensuring Claude stays on track is still worth a lot.

Once the prompt is finished, I either /clear or just restart Claude Code from scratch for the next prompt.

Short prompts are better (in the sense of fewer TODO’s per prompt, all else equal). It might seem to take longer because now you’re recontextualizing on relatively smaller changes, but the shorter the context, the better. Less drift.

Another thought I had is that long sets of rules to follow don’t work consistently. Simply saying, “here are my OTP rules, build robust OTP” has mixed results. “Don’t use sleep to solve concurrency issues” works, until it drifts (happens faster than you’d expect, due to fighting the grain on Claude’s preferences). On my next go, the design will be worked out in advance with tight audit/checks for every prompt to ensure compliance. Just because all of your tests pass doesn’t mean the code is good. Just reiterating this main point: generalized guidelines at length will fall apart.

5 Likes

I love the idea of using an Obsidian vault for the docs!

4 Likes

I am using Amp Code with a curated AGENT.md file - the results are pretty dramatic - I’m regularly posting examples on X: https://x.com/mikehostetler

Amp allows sharing threads - so you can see my prompt and the entire flow here:

https://x.com/mikehostetler/status/1940794759398006957

Here’s another one with a link to the thread and the PR it produced

https://x.com/mikehostetler/status/1942223175787430245

4 Likes

Is this very different from using Claude Code with explicit subagents? (Noting that Claude Code will sometimes use its own implicit subagents.)

And does Amp Code piggyback on Claude’s Max Plans for Claude Sonnet or do you need to use the Claude API key, which can get crazy expensive?

I believe as of about two weeks ago you can now connect a Claude Max subscription to Cursor and Roo so maybe to Amp Code too.

The AmpCode blog posts seem to be all in on Claude Sonnet, like this one on subagents, but are woolly where it matters.

I’m using bare Claude Code whilst working in git worktrees, but I’m open to other approaches that are competitive with Anthropic on price.

2 Likes

I’m doing a head to head comparison with Claude Code - the PR is up here:

I personally prefer Amp - I don’t even think about model selection anymore - it just works

1 Like

I’m just a lightweight Cursor user building webapps. I rarely ask the AI to code Elixir for me. I often find myself unhappy with the result, or it simply takes more time to steer the AI into the right direction than it takes to write the code myself. Adding the Tidewave MCP server helped improve the workflow with the AI agent quite a bit, but I still find it lacking.

The hallucinations and stubbornness of the AI is a real problem for me. I have had repeated “arguments” trying to convince it that a function it hallucinated doesn’t exist…just to have it go “Oh you’re right, I’m so sorry…here’s a version that is guaranteed to work: produces similar output with hallucinated functions still there

However, I do get real value out of Cursor when it comes to doing the stuff I don’t know that well - especially CSS. An oversimplified example might be me selecting a block of HTML, ctrl+k → “center this horizontally”, and Cursor spits out the correct Tailwind CSS to center the content (which varies based on context).

6 Likes

Check out Kiro. It’s free right now with Sonnet 4, and works pretty well with Elixir. Has a very decent planner that creates a design, requirements, and managed tasks.

2 Likes

I’ve been using Claude Code for the past 2ish months. It was really good at the beginning. Solving pretty complex problems very well and code quality was pretty good.

The last few weeks it has been significantly worse. Struggles with anything more than the basics, making up syntax and functions, straight up lying that it fixed things, etc.

I’m not sure what changed, but hoping they get it back to where it was.

It’s on my todos to give Amp a real try, as well as OpenCode and evaluate those tools.

2 Likes

To me that’s Gemini Pro v2.5. Its quality noticeably dropped in the last month. :confused:

2 Likes

The model seems to perform identical to me. Sre you sure its not just your perspective that moved?

1 Like

Quite certain, but hard to know for sure. I have had cases where it completely made up syntax elsif multiple times over the course of 2 days. Never did it before, haven’t seen it sense. Nothing else changed - same code base, same CLAUDE.md files, etc. That’s just one example amongst many.

Last night Gemini gave me Java annotations as Elixir module attributes. Never happened before.

And the quite tragic thing is that I’m giving it easier tasks compared to two months ago… and it hallucinates like there’s no tomorrow.

It’s really difficult to tell whether it’s not a mix of factors of course. But the scope of tasks I’m giving it is actually smaller and it still fails.

I’m guessing it’s shenanigans over the free tier. They want the positive PR of their big context window – 1M tokens – but likely want to spend less.

It’s gotten so bad that I started considering subscribing to Claude Pro/Max.

All the major vendors do batching of multiple requests on their backends, maybe they have upped the batching to cope with load and the context is leaking (more) between users?

For those who doesnt know: Your request goes into the model alongside other requests.. I am actually not quite sure how they do “sandboxing” or even separate the data enough to get distinct answers, but apparently this is why two requests with temperature set to 0 will still often give different results instead of deterministic ones.

4 Likes

I write almost all my code with cursor using sonnet 4 or o3. It’s great.

2 Likes

The more I use AI for a Phoenix LiveView app, the less value I find in it. I need to re-write nearly everything. Usually the issue is around rescuing exceptions and then not notifying about them. Or using the non-raising version of a function and then not doing something reasonable when it returns :error.

It makes me think that the models were trained on miles and miles of mediocre code. After all, isn’t most code mediocre? I wonder how they appraised the code quality of their training corpora.

1 Like

They very likely did not. “Pour a few metric tons and the AI fill figure it out” is probably the most thinking they’ve ever done.

4 Likes

I’ve ditched liveview for our app as we aren’t doing consumer facing web. Using ineritajs with react and vite, shadcn for layout. It’s just our admin, but it’s entirely vibe coded and actually works pretty well. Recommed if you’re doing internal stuff - wouldn’t be this cavalier for user-facing. It’s extremely efficient to develop.

2 Likes