AI is getting ridiculously productive

This morning I set out to make an Elixir library for internal use of some specific statistical method that only seemed to exist in some Python code. Said method is processing intensive, and I prefer not Python, so aiming for an Elixir library instead with Rust for the processing heavy stuff.

So the prompt went something like: I want a best practice Elixir library for XXX with Rust doing the heavy processing and a Rustler NIF in between. Make a plan to research this method and its algorithms, make tests including mocking end-to-end with known expected results, and use relevant skills for both planning and implementation. After passing tests run code quality checks set on strict and fix any issues. Then write an API user guide.

A bit over an hour later I had it complete with passing tests, code quality checks and results that matched the Python version across all relevant tests - just faster and now through an Elixir API.

Then I figured it might be even faster using the GPU for some bits. So I asked Claude to use OpenCL to move some of the suited processing heavy parts over there. Less than an hour later those parts were running correctly on the GPU instead. (And a lot of that time was about getting a OpenCL setup working for Intels built-in laptop graphics).

I’ve seen talk about reports saying we just feel more productive using AI coding, but that we’re actually less so. That is not my experience, and I don’t think I’ve fooled myself either. For instance I just can’t see any part of this where I would have done better without AI.

This was just about 4000 lines code across Elixir, Rust and OpenCL. Larger codebases which doesn’t fit properly in the context have more issues. But still - I’m pretty sure I’m more productive, and by fairly big leap.

Edit: Fixed some typos. And after benchmarking the Elixir/ Rust version is between 10x to 30x faster, and the GPU version ever so slightly faster than that again. (A discrete GPU will make real a difference).

9 Likes

Yeah, that very much matches what I’ve been doing lately. Personal projects that I wanted to do for years but was always too busy or tired – now take days, and those are the complex ones. It took Claude a small 1h back-and-forth session with me to port my AstroNvim editor config straight to LazyVim; almost a one-shot. We cleared up a few visual preferences, added a plugin or three, I have then tested the setup in a brand new container, corrected just one small mistake – done. One hour.

So many things that were viewed as arcane knowledge or hard blockers can be done now, and in a way that does not rely on the AI agents after. One thing I want to also do is have a Manjaro (Arch-based) dev machine – and I already do – and have a script that syncs all its packages to Debian VM(s). Still working on it and it’s not done only because work is way too busy and I have too many other things going on so when I have 2 free hours in the entire day I’ll prefer spending them with family of course.

But yeah, we can and should move forward without too much fear. Obviously vibe-coding can and does do stupid and even harmful mistakes. “Git gud at prompting” is my response, though obviously not a panacea either.

Next step is for all of us to overcome our overprotectiveness of our own software and start doing a proper convergence because right now efforts are scattered in many areas. Historically speaking, 500k programmers all reaching 92% of X tasks is not good enough. We should start nailing stuff soon-ish.

But I don’t want to roleplay a guardian of the planet. Somebody else can initiate it. I am saying what I think should be the next leap. Convergence.

3 Likes

Imo this is the important portion here. Where you can have such guardrails and clear scope AI usage is really useful and productive. The less clear cases are where you do not have that.

4 Likes

I find myself just testing more directions now that they cost less. I can work with many parallel versions or even projects at once. Sometimes I find unexpected improvements that I would have passed right by otherwise. That results in real quality improvement.

Convergence within the computing world tend to come and go as I see it. For both better and worse. I still remember with anguish the time everything was converging on Java and C++.

I think we are still in the very baby stages of AI so I expect this is still in the rattling development tumbler for yet some time before it starts settling down. The daily experience of using various sides of AI is certainly getting better year by year. (Well, maybe with the exception of other’s invasive AI in places you don’t want it.

Official technical convergence can be a real innovation killer. There are ISO standards out there that has literally frozen technology at what was the best available in the early 1900s.

1 Like

That is true. This was a very testable project.

A bit philosophically I’m not sure less clear always translates into worse though. It certainly does translate into more tokens use and time so there is that. Maybe I’ll call that an exploring phase.

I am making things work that I could not tackle myself. I can’t port algorithms to Nx but I can verify the outcomes. I would not put a week into foguring out how to build a vendor toolchain for some obscure GPU but it takes Claude an hour or two.

I have reworked a mmwave firmware project where I was just a bottle-neck and eventually just tooled up the build-upload-reconnect-evaluate loop so Claude could do it over MCP and it probably shipped 12 iterations before it nailed the issues but now I have the data flowing. Next we’ll start implementing more processing in Elixir.

some of this needs a lot of verification and QA on the results and it is proof-of-concept. But it also would not have been worth doing since it’d have taken me months to learn enough background to do it. Now I have spent a few days pushing three relatively ambitious efforts in the same angle forward.

It is wild. As long as they figure out the price/perfomance of these models I don’t see how our industry doesn’t change massively and weirdly. I am not necessarily enthusiastic about that but for the type of contract work I do disregarding this would be unwise.

5 Likes

That is exactly what I was thinking starting this very project. I’m clueless and lost at cutting edge statistics, but I know for sure what it should output and there are few sharp edges.

The verification stories seem to often lag behind the AI though.

I’ve been thinking about this article a lot lately. I think it’s correct.

4 Likes

I’m not surprised Elixir does well, and there are good reasons for that. But I am very surprised of of the Rust results. Rust has been working very well for me, with minimal need to churn over things. First run is usually working or very close and easily fixed. Maybe their tests were much larger or just unluckily different.

1 Like

Strongly agree with these and similar sentiments in thread. This is my experience lately with Elixir, Rust, TypeScript, Python, D3.js… many examples of outcomes that would have been impractical for me to do on my own. Even counting the additional time I spend in planning and cleanup, I can do things in hours/days that would have taken weeks (or been skipped entirely).

3 Likes

Same here, I’ve been porting interferometry software from C++ to Elixir + Nx because as unable as I am to write those Nx parts myself, I have the optics knowledge, Elixir practice, and ability to use the original software to produce a golden master testing harness before tackling extraction, which makes guidance and verification of a robot’s work easy.

But as @dimitarvp raised, now that I have my alternative build (well, it’s concurrent and parallel whereas the original used singleton QT mode, so computer-wide single instance mode), I am trying to see how I could contribute to the original project instead with architecture or even UX ideas that my port demonstrates. Otherwise I am creating partial advancement and dispersion from the original, despite having actual people wanting to use my build (which covers the 10% I use of the original software).

2 Likes

I have generally been a generalist in my career. I am not nearly as wide a generalist as Claude.

I do get good mileage out of my creative brain parts and hubris..

1 Like

I’ve observed a massive shift in the capabilities of AI, Claude at least, in the past few months. Last fall it felt impressive but still a toy and I shuddered to think of Claude authored production code being shipped. Over the last month we have ramped up Claude integration with our development process and my productivity is up 10x at least. I will say code quality is dropping. We are shipping more code that is not as well-tuned despite my attempts to add as many guard rails as possible because my arguments only have so much weight when the cost of rewriting is so low. The CEO who hasn’t touched the code in years is pushing up huge PRs with major features we have not had the resources to tackle and is only so receptive to my warnings. My biggest fear at this point is getting dragged into a “black-box” model for application development where there is simply too much code churn for me to have any ownership over anything but the prompts. I can’t imagine the pleasure and satisfaction in the craft remaining the same, if it remains at all.

edit: I realize this just makes me sound like every old dude gone before and AI is in some respects no different than any other technology, but I’m not so sure. Time will tell

4 Likes

The black-box model is scary, but I feel less so for functional programming with pure functions. If the inputs always give the right outputs do I really need to know the details in between? (For speed and resource use yes, but the ability to know it is a correct black box makes it less scary). If I can test those I sleep well I think. And then there is the real world and side effects, but at least less surface area to really worry about.

I also think that maybe Claude and Co at some point will be able not to just make it work, but also explain or better yet prove that it works. We are not there yet, but the future is a long time.

(I faintly remember there used to be something called “formal verification” which seems largely forgotten about today. Maybe that will make a comeback. If it can be proven to work I will happily take a black box component).

I think there is also a case to be made for diversification according to risk. For critical system or life support I would not use LLMs today. But for most other things I’m beginning to open up to the thought. Indeed my main project these days is mostly Claude coded with me doing the requirements, direction guidance and big picture decisions. And chatting with big letters as needed. I might even have taught Claude some new words on occasion.

For me the ability to make, prototype and test with such ease is very refreshing and satisfying.

(I’ve sure had my share of Claude frustrations so there is that too. But I’m thinking give it another year and we will really be cooking).

1 Like

4 posts were split to a new topic: When will running LLMs locally be realistic/achievable

This has been the nuanced take I was waiting for in this thread. Yes, an LLM will sput out 1000 lines of code without a hiccup, but the actual developers that need to work on it will spend days refactoring. The effects of outsourcing seem really similar.

1 Like

Not my experience. If anything Claude is better at planning layers, contexts and boundaries for easier extensions and refactoring than I am. (Which is not very hard to achieve, but nevertheless). The real trick is that Claude is pretty good at refactoring if needed. That is my story and I’m sticking to it!

Just another datapoint but for me LLMs are better than the work outsourced guys were doing (and also better than big chunk of my company workforce).
I guess the big thing is that someone needs to keep high level architecture in his head at all times though because without it commanding AI is just less effective. Also it helps to have high code standards. Many of my coworkers didn’t care about code quality even a bit, I guess it will also be reflected in their AI work as they won’t guide models as much (even without guiding the quality of AI output is higher than human output though so I still count that as a win).
I wouldn’t say that using LLMs is similar to outsourcing. Even just the sheer speed makes them uncomparable (as in even if effect is similar at first glance, the speed of delivery gives you new possibilities like deeply exploring design space, making 5 variants of some thing and picking the best one).

1 Like

After Opus 4.5 came out, I migrated my 180k LoC project (generating real revenue from real customers) from a one-Postgres-schema-per-tenant to a shared-schema data topography and architecture where everything is under public and scoped by tenant_id. As one can imagine, this was a massive undertaking and required:

  • a complex series of data migrations for 150+ tenant tables, grouped into levels, with each level ordered based on interdependencies
  • post-migration data integrity checks and testing (make sure counts match, FKs match, etc. etc.)
  • post-migration performance benchmarking scripts
  • rewriting all the context modules and their queries to take and use tenant_id instead of a string tenant prefix
  • modifying the controller contracts
  • essentially rewriting the test suite, including factories and mocks
  • a huge amount of manual testing

The migration also required changing integer PKs/FKs to uuids. The reason this had to be done was because we were previously using IDs as user-facing display numbers (e.g. Invoice #3829), and in the new system these had to be preserved. So the work also involved adding new tenant-scoped display_number columns that would contain the old schema-scoped integer IDs, making sure they auto-increment correctly moving forward, and rewiring everything accordingly.

The final diff was +50k -18k. You read that correctly: one branch, fifty thousand lines added, eighteen thousand removed or changed. About 15k of the new lines were documentation, plans and Ecto migration files. The rest were new code and new tests.

Without AI, this would have taken me multiple months and there’s a good chance the sheer scale and tediousness of it (as well as the risks) would have burned me out. I also would not have been able to add new features easily. Especially because I have a full-time job.

With Opus 4.5, it took two and a half weeks (mostly evenings and weekends). I read every line of documentation/plan it wrote, every line of code change that it made and tested everything thoroughly, both manually and also using ancillary AI chat sessions where I intentionally kept context limited. Then I deployed the branch to staging and had several users do UAT using their own (migrated) data. They found:

  • Two bugs related to the data migration itself (I had some {:array, :integer} columns where the integers were FKs and the AI didn’t catch that, and neither did I because I had completely forgotten about them)
  • Some cosmetic issues where the UI was still showing id (now uuids) instead of display_numbers
  • Four HTTP 500 errors in some rarely used features, caused by modified API contracts where the FE was still calling the endpoint with the original payload shape. Easy fix.

So yeah. An eight year old medium-sized Elixir project went through a titanic architecture change in a short timespan. AI planned out the entire thing, wrote all the code for it, wrote all the post-migration data integrity checks and performance benchmarking scripts, rewrote most of the test suite, and modified the FE to match the updated API contracts. It also wrote the operational step-by-step for the rollout. We deployed it in a “big bang” fashion and it has been fully stable.

Here’s the fun bit: I actually got a few quotes for this project, and they ranged from $100k to $175k, and estimated timelines ranged from four to six months. In contrast, my total AI spend during the 2.5 weeks was less than $2k. Make of that what you will.

3 Likes

And I was amazed at which ease Claude implementing some 4000 line statistics library.

That is at another level entirely!