Is anyone here using any AI agent frameworks/loop programs/scripts?
I’ve been using Claude Code/Gemini/Codex pretty heavily, all manual - IE I create a new Git worktree, open a new terminal start claude and go into plan mode to explain what I want. This works well, but is not automated in any fashion of course.
What I want to do is have a CLI program/script that continuously pulls from a backlog of issues and spins up agents for different stages - ultimately creating a PR for review.
There are multiple tools that do this, some seem very complex and hard to understand what they are doing/how to use it, other seem pretty close to what I want:
Wow, I was thinking this to soon exist in another post yesterday, I guess it’s impossible to keep up. My next prediction will then be a simcity-like GUI to manage all of that.
I’d be curious to hear about people really using this - no judgment from me here, I really don’t manage to imagine it.
The problem (at least the problem I am trying to solve) is why don’t I have an agent (or agents) running 24/7 building features, fixing bugs, etc autonomously.
Whether or not this can work well enough to be useful remains to be seen. I’m happy with my current flow, but it still requires quite a bit of manually intervention, and sits idle many hours a day.
It’s the “do it well” thing that’s still under active debate. It’s very far from a given. It’s far from being proven as well. It’s very, very much in the air still.
I disagree; the worst case is that it balloons your line count by 15x and you are left holding the bag and burning through even more tokens (and through your wallet) trying to fix the mess.
In my experience during 2025, that risk is under-represented and downplayed by a lot of folk. I will not go as far as to claim it’s a conspiracy by the LLM vendors. Maybe people with sunk cost fallacy did not want to look bad so they either avoid the topic or downplay the negative outcomes.
In any case, fully autonomous agentic coding is still high-risk. That’s how it looks from where I am standing. It’s not a win-win at all, careful curation is still very much needed. And that part is still extremely difficult to automate or outsource to other agents (though that latter part might be already changing).
@Dmk I have not, I don’t have the personal budget to burn several hundred dollars a day, or the bandwidth of attention. I have a very small brain, and very little attention. There isn’t anything new or particularly novel about the idea, see prior work with: MetaGPT honestly gas town seems like a troll post with sparkle emojis
Additionally, the security implications are kinda insane, and reviewing the AI reviewers of the AI reviewers, when does it end, a never ending firehose of PRs to sign off on?? no thanks.
I think more lines of code is not the only metric for quality of life improvements. Maybe if the gas town idea was more efficient, and usable by regular people, would I consider the idea, as of right now it’s a dead end for general computing, imo.
The best features are no features at all.
It’s like at a certain point you can only scale so far vertically, and horizontally, before looking inwards and considering optimizations as the biggest win, additionally in the sense of visual programming, UML, BPM, do we care what is inside the blackbox programming paradigm? there is a balance between abstraction and introspection, and wholesale token generation to replace a 50MB blackbox with the inputs ( thing: str ) and the output → void print thing, which literally is just a print hello world
Agreed. I want to explore it though, hence asking if anyone has tried any of the existing tools, and what their results have been.
Also agree with this. But worst case is that you try it, it yields this result, and you say no thanks. You don’t have to accept it or use it. Worst case isn’t blindly accepting the output.
First I’d like to put this into the actual reality context where the following applies:
In my own experience, most of the time spent today (say, when building a full featured multi-tenant Phoenix LiveView app) is not being spent on coding or even designing the system, but in (no particular order) on brainstorming, designing UX, defining visual standards, sensing the user needs, live-testing, capturing of and reasoning about the user feedback, and yes, marketing the product, all being workflows that are not even technical.
Since it would be borderline insane to delegate the said stakeholder responsibilities to an LLM, and since the technical part is definitely not the bottleneck (but could easily become as a result of an artificial “creativity”/automatic code generation), why would one even invest into optimizing that what’s not the problem in the first place?
The sheer notion that there are enough (human) resources to a) devise a large enough number of business use cases/opportunities and b) be capable of filling up an enormously parallel agentic enterprise with articulated requirements is laughable.
In short, if the idea is to produce something that generates revenue once deployed, developing it is the least of all concerns.
I was very serious about the next-gen being a stateful Simcity-like environment given what seems to start existing today, I think this will be a simple consequence of the balooning swarms running even when you are not there (and I guess I will try it if it exists out of curiosity). Today this seems very wasteful though.
I think using something like Gas Town with frontier models could get very expensive very quickly. If you have the compute, it could be fun and interesting to experiment with a local model.
This is very near the kind of stuff I thought about. I explored the underlying tech marketing sites and felt old.
I guess an organisation could have a stateful “world” that represents all their ongoing projects, and have swarms of agents “cities” that allocate resources and/or collaborate through global “institutions” to help factor projects in the light of business goals, while cities have “districts” or other things working on the actual details of each project. The next-gen developer would then log into this world and see how things are going by questioning “officials”.
Thinking about it more, I might have gotten this idea from Cixin Liu’s The supernova era which is a sci-fi port of William Golding’s Lord of the flies. Well Gas Town looks already like that but the next step might be realizing interacting with this through text and terminals makes the management too much one-dimensional, whereas two-dimensional models (infinite canvases) or three-dimensional ones with a notion of permanence (RPG-like worlds) would be more apt.
I’ll watch this from very, very far away if it happens ! But as we seem to collectively love stacking complexity in layers, I don’t see why it wouldn’t happen – except if it is a bit too much complexity to build with claude code ?
.. and every transaction gets permanently written on the blockchain, I imagine.
But how does the fundamental workflow of the business model fit into all this - the part where top RE developers cut deals with / bribe the city officials into keeping the zoning laws restrictive so they can keep on milking the buyers and renters dry with artificially overpriced housing units?
That’s what I’ve been thinking when it comes to deriving tangible value out of the token-spending loop.
A business (typically?) involves way more than just software. The hard problem is not writing more code.
And then humans become the bottleneck… we still need human pace to operate the rest of the business.
Questions
Some people may be exploiting source code intensive businesses for which they’ve already solved the distribution/sales?
Maybe all the tokens and dollars spent go towards negative ROI?
Are agent loops successful operating real life consequential things with better than random success rates? You can throw away software and start over, but you can’t e.g. undo bank transactions and try again.
I built my own tool called Stride (https://www.stridelikeaboss.com) to facilitate an AI driven workflow that allows humans to decide to inject themselves at different points in the workflow. There are several companies in Canada that are using it right now and interest is growing. It is a Phoenix/LiveView app. I have blogged about it a lot at https://cheezyworld.ca.
These 3 links seem to have good Elixir baselines for the AI to follow good tips and practices.
My main concern with AI workflow is:
Can it follow existing patterns reliably?
Can it find existing modules/functions to avoid duplicating code?
Is the code it’s writing testable?
I still haven’t found a good way to do this, but it’s getting there. I have a prd.md file that explains how to write and plan a PRD so Claude can ask me the right questions and end up with a good document to actually begin building the thing. The last line of prd.md is:
When finalized, write the PRD to: `docs-specs/[feature-name].md`