Autonomous AI Dev Workflows

My experiences with all three of your questions is yes - absolutely without question. For me Claude is build very high quality code (verified by static code analysis) that is well tested. There are many others that have had similar success.

For the prd - are you trying to create a document that is for humans to read or are you trying to create a good document for Claude? They are not necessarily the same thing. Human targeted documents have the potential to leave out a lot of implementation details and therefore leave the agent to make a lot of decisions. I wrote about this here → What is a task? | Cheezy's Blog

Another challenge you might have is speed. Claude will implement a change much faster than a human can. Therefore in order to keep Claude busy you will need to create a lot more requirements and fast.

Finally, the prd.md file is simply a document that is added to context (if I understand how you are using it). Think of it like a suggestion. There is no enforcement from that perspective. You will be prompting it repeatedly to have it follow the rules you have created precisely. Making this a Skill will remove it from context and will cause Claude to follow it more closely. That might be a good step for you.

My process is that I will use the superpowers brainstorming skill ( superpowers/skills/brainstorming/SKILL.md at main · obra/superpowers · GitHub ) to create a document that describes the next feature the team plans to implement. I then ask Claude to break this document down into goals and tasks using Skills that are a part of Stride. When this is finished I end up with a list of very detailed tasks that the agents can consume.

Hope this helps.

3 Likes

One more thing. Trying to introduce AI into a standard non-AI SDLC will create a lot of problems. I wrote about it here (Tensions with AI | Cheezy's Blog) and here ( Adopting AI Driven Development | Cheezy's Blog ). IME, if you want to see success you should be ready to throw out or significantly change the software development methodology in order to see the best results. Optimize around what AI does best and what humans do best.

I would say this was a much bigger problem until about Nov 2025, but Opus 4.5 improved things in a major way. The code it writes is way less verbose than its predecessor models, at least on the Claude front. I can’t comment on the other models as I don’t use them a lot except for simple bug fixes.

That’s not to say that we’re “there” in terms of autonomous coding, but that the recent progress has been quite promising.

2 Likes

Yep, can confirm, just started using Opus a few days ago and I am blown away. It can still do stupid lapses here and there but if you keep it in line it does absolutely amazing.

The way I see it, being a good operator is now a full-blown marketable skill.

2 Likes

Hey @Cheezy , thanks for sharing.

Could you elaborate a bit on the testing phase?

For example, are you using TDD? Unit testing, integration, e2e, property-based testing, etc.? Is Claude writing the tests for you? Before or after writing the code that should pass tests? And how do you handle edge cases?

Apologies for the litany of questions — I’m just getting started with AI + TDD and it’s at the forefront of my mind. I’ve seen a lot of posts about writing tests with AI that make sense on the surface, but they rarely go deeper than, “it’s a best practice that you can speed up with AI so you should definitely do it.”

I would be grateful for some concrete, evidence-based info, even if it’s based on your anecdotal experience.

Hello @SyntaxSorcerer ,

Great question that might warrant its’ own thread. I’ll answer here and let the moderators decide to move it if they desire.

I am a huge fan of TDD. In fact I have been teaching TDD to developers since the early 2000s. I know there are people that are trying to get a TDD workflow in place with AI but that has always felt like trying to force AI to do something that makes us feel good instead of something that would help drive quality. I blogged about that last September → TDD With AI | Cheezy's Blog

I find that Claude / Tidewave is able to write well factored code with excellent tests. With the addition of Tidewave you now also have the agent testing the changes it just made through the browser without being prompted. We have AGENT.md / CLAUDE.md files, Claude Skills / Hooks, and Subagents to enforce our quality standards as well as credo, sobelow, etc. to ensure code quality / safety. These things work amazing well.

My personal workflow is that once I create my backlog (in Stride) I quickly review the testing_strategy and verification_steps for the more complex tasks. Once I am happy with that I ask Tidewave to implement. Due to the configuration I have I feel confident that it is writing good tests and testing the changes. At the end of a feature (usually 10-40 minutes) I do a quick manual test. If I find an issue I ask Tidewave to fix it right away. When I’m happy I push to production.

I think there are two things that make this work. First of all, my tasks are very small and therefore easy for the agent to tackle and test. Second, I (Claude) try to provide all of the necessary context to the agent so they are more likely to get the requirement right.

To summarize Claude writes unit tests for everything (the main way of testing), Tidewave performs some integration testing, and I also manually test just before deployment.

For a completely different context, at my last client (Java and Typescript) we had a dedicated tester on each team (3 teams). The teams followed a two week cycle but it was typical that the developers would finish the development in two or three days and the tester would spend the remainder of the two weeks testing everything. It felt like a huge constraint. At first the testers would find small things but over time they would rarely discover any issue of significance. I am no longer with that client, but I am sure they were heading to a place where they will not be using the testers but instead rely on the testing from Claude. Again, we had good AGENT files, custom Claude Hooks and Plugins that we built out over several weeks.

Hope this helps

1 Like

This is fantastic — thank you for your detailed answer, @Cheezy !

I hope the mods decide to move this because I agree it might warrant its own thread.

After spending a few months reading Elixir (and related) books and taking online courses, I just got started building. Testing is at the heart of my workflow. You’ve given me some good ideas and I’d also like to share parts of my testing workflow in case it’s helpful for others, and maybe to get some feedback.

We’ve got a pretty active group working on autonomous SDLC in the Jido Discord (not gonna link, you can find it)

My wreckit project was a prototype in TypeScript, I’m slowly porting this into Jido.

Jido will have a coding tool - we can already orchestrate Claude Code and generate code in sprites - but the entire workflow is still coming together.

As far as “Why?” or “Is this really possible?” - I have my doubts :slight_smile: - I view it as a learning exercise that will prove the value of software engineering over the long run

2 Likes

Looks like Anthropic themselves are starting to include autonomous agents supervision trees in Claude code. The last release announcement says that a session can spawn others (that you can interact with) and that sessions can message their spawner but also message each other.

I can’t help but wonder if Greenspun’s tenth law (Any sufficiently complicated C or Fortranprogram contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.) should be complemented by another law for our very connected age, since every app that grows for a while seems to develop an ad-hoc actor model.

Since they sell tokens it only seems logical to encourage bigger agent structures. Too bad it’s not built on the platform that supports it by design, it would have been some kind of spotlight for the BEAM.

1 Like

Alright I’ll join that group.

Agreed on the last part. Enjoying trying things out though. And as @Lucassifoni mentioned, I feel like what I’m trying to solve will be included in most of the providers in short order.

1 Like