Good: AI is great at Elixir. It gets better as your codebase grows.
Bad: It defaults to defensive, imperative code. You need to be strict about what good Elixir looks like.
Ugly: It can’t debug concurrent test failures. It doesn’t understand that each test runs in an isolated transaction, or that processes have independent lifecycles. It spirals until you step in.
Bottom Line: Even with the drawbacks, the productivity gains are off the charts. I expect it will only get better.
150k LoC doesn’t mean anything if you don’t have a of declaration of what it does.
Our Elixir project is at 31k lines cloc --exclude-dir=node_modules,_build,deps . and I will remove a few thousand since this is in the middle of a refactor and I haven’t been cleaning up things thoroughly.
It’s a fairly simple app, a full admin, api for mobile clients, orchestration logic via Oban to process data, spin up servers etc. Some 3rd party integrations.
I use Tidewave quite a bit and I’ve noticed the Anthropic models (haven’t tried others) are happy to duplicate logic like crazy, this is why I’m wary about LoC as an indication of anything really (and personally I want to keep it low, not high).
Does it matter how many lines of code there are in nobody reads them? Does it matter if logic is duplicated if nobody ever has to change it?
Everyone trying to apply good software practice to vibecoding (“vibe engineering”, “subagents”, “orchestration”) is going to get wrecked by the bitter lesson as usual. Differentiability is king.
Curious what upmarket means to you. I could see it as higher quality, better designed software from a UX standpoint as Maggie alludes — the underlying code may or may not be “better” — or software that can’t be done by LLMs which I suppose might only include building the models themselves.
We did find while developing it liked to duplicate code
I touched on this a little bit in the “AI Can’t Organize” section. It’ll happily rewrite code it thinks it needs if it doesn’t know where it exists, or where to look for it
so it’s important to keep codebase architecture consistent, but have some form of documentation the AI can read so it can find logic that already exists
It’s a little bit like onboarding a junior dev with amnesia every day
It means expensive enough to justify the up-front cost of human development when models are orders of magnitude cheaper. The quality is downstream of that because nobody will pay for junk when they can get junk for free.
If you want to keep programming, go where the models cannot. Models won’t be able to replicate a level of quality outside of their training set for a while, and while it’s easy to RL tests that pass it is very hard to RL UX. So yes upmarket means quality, but it’s not a meaningless heuristic; you have to understand why the models can’t get there.
I have followed “AI” closely for a long time and I always believed this was coming, so I am perhaps not as shocked by the progress as some. To be honest, what did surprise me was just how hard it is to hit the level of quality I believe is needed to succeed in the new market. This was partly a skill issue (I have gotten better!), but I was also spending a lot of time swimming upstream with tooling.
You are aware of my solution there
This is wrong, for a subtle reason. Bad code leads to bugs, and bugs destroy UX because they harm user confidence in the software.
All good if the software is cheap, but nobody will pay for junk.
This is a bit surprising to read. Opus 4.5 helped me do a comprehensive refactor of my app’s test suite, and concurrency was a big part of that. It did deep dives into each domain and identified which tests are safe to run concurrently, and we were able to make the suite run 40% faster overall. It also fixed several intermittent test failures caused by concurrency, as well as the dreaded DBConnection errors that tend to clutter the test outputs.
Yes. I use supertester | Hex for building robust concurrent tests. Your prompt might say something like, “Add supertester latest version from hex to mix.exs and get deps. Use supertester to refactor the entire test suite per these docs: <ctx>{docs text}</ctx>." with the content of these three documents as the {docs text}:
The library arose from the need to address the same issues experienced by @John-BoothIQ. Hope this helps someone. I use it on many of my projects, and it works well.
Does it matter how many lines of code there are in nobody reads them? Does it matter if logic is duplicated if nobody ever has to change it?
Duplicated logic, at least when it amounts to duplicated knowledge, is a change for inconsistencies, which means bugs. Simple example: the software lets you create a username with format X, but then it won’t let you log in unless it’s format Y.
Also, in my experience, LLMs perform way better on small codebases (or targeted sections of them) than when they must read a large amount of code. And they do way better at “add another component/test that follows the pattern of existing ones” than at starting from scratch.
So if a codebase becomes a giant pile of duplicated mess, then yes, I suspect it matters a lot for how well LLMs can continue to work on it, just like it does for humans.
Afaik this is due to the context window which may be too small to hold the whole code base ‘in memory’. Hence duplication: some functions are simply not seen.
AI workflows get better so a preprocessor can pick relevant parts of the code for the context and use this subset.
If you care about security, correctness, quality, or craft I don’t think you should be letting these models anywhere near your codebase. I certainly won’t.
But realistically most of the industry stopped caring about those things long before LLMs, and realistically these models have probably already surpassed the security skillset of the average webdev anyway.
Implicit in this point, though, is that I expect those things to change. Security and correctness will come easily, probably faster than you think. Honestly probably this year. Models are already good at finding security bugs, so a little adversarial RL will go a long way.
Quality is a lot harder, in part because it’s a matter of where you put the goalposts. My bet is the models will be outputting mediocrity for a while. Because their training set is mediocre, and because it will be hard to RL UX as it’s essentially AI-complete. But we’ve been putting up with mediocre software for decades now, so honestly I don’t expect things to get dramatically worse. If anything it will free up the real programmers to actually spend their time making good things again. Glass half full.
And that gets into the last one. I’m not sure if the models can ever replace “craft”, because to me that’s in the eye of the artisan. And there are still artisans all over the world making things exactly as they have for hundreds of years, industrial revolution be damned. But do not let that analogy distract you from the fact that, unlike physical goods, computer programs can be infinitely copied. The playing field is much more level.