I have been doing some pretty intensive research over the last few days around agents/vibe coding. I tried some commercial offerings like windsurf (with different models) but also self-hosted models. I have correctly hooked up 2 MCP servers (context7 for docs and tidewave).
I tried both agentic coding and inline code suggestions. LLM based autocomplete works really well and saves me a lot of time. What I liked the most was vscode + continue pointed at a self-hosted instance of Qwen2.5coder-instruct (running on my gaming rig).
However, after trying to get agents to work nicely, my feeling is a resounding: āmehā. There are moments where Iām surprised that it manages to do something correctly, but a lot of the times it needs so much handholding that Iām simply MUCH faster doing it myself. In particular, since Iām using Ash I can achieve so much in so few keystrokes that explaining it to the model and then waiting 3 minutes for it to do its thing is so much slower.
Often, claude gets confused and starts going in circles and circles, wasting so much time.
Similar, but with a much less involved setup. I have asked multiple models to do things with Ecto and it will hallucinate functions.
I have switched to improving the Phoenix generators for myself and that has saved me a lot of time and repetitive work. Now I know that a certain style will be adhered and the source of bugs is inspectable and fixable.
Claude started off well but I noticed itās quite bad lately indeed. It outright hallucinates functions that donāt exist f.ex. when I casually asked it if there is a way to clamp a number between two values if we make a function that adds (or subtracts) a number to it, it said āuse Enum.clampā, lol. I only ask it superficial and short questions now.
āVibe codingā is basically you giving instructions to an LLM to gradually evolve / create code, quickly checking the result (without putting almost any effort to it), running some tests and accepting the change, then do the same in a loop. The idea of the āvibeā word here is: you are not working very hard, in fact you are almost not working.
This remains an issue. Partly solved by prompting to check every output with documentation. But even then AI comes up with bogus.
Sometimes itās very good though. Letās hope the ratio will improve.
Vibe coding is like swipe typing. If it works itās great, if it doesnāt it slows you down more than you imagine and you always have to check the generated code yourself.
Oof, too close to home. To this day I refuse to type normally on a phone and insist on swipe typing. But I am already extremely sick of it⦠and of typing on phones. Not like we got any actual competition on smartphones, mind you, but I would have long ago went there if we did.
I have several successes with LLMs enabling and boosting me and yes that requires you to be a pedantic sonovabeech. I donāt mind though. I made very good money a while ago by being pedantic, and I keep winning in many situations by being so as well.
Us the good programmers, in this āAIā era, have only two things to do to not only remain relevant but to also start commanding even higher payments several years down the road:
Donāt give in to the temptation to outsource thinking to the really good models (DeepSeek, GeminiPro, Qwen3). Only use it as an educator, generator of boilerplate that will not make you a better programmer if you spend 3 hours typing it out painstakingly, and a discussion partner (tradeoffs, common wisdom and the like).
Not die. Thatās right: āAIā makes the really good programmers even better, and the bad ones ā even worse. Weāll win by merely staying alive.
Everybody is celebrating the death of programmers, but what I am seeing is that there will be need for them even more in a couple of years when the world wakes up from its sweet dreamy slumber and discovers how many āvibe codedā apps are in super crucial positions in society and even power.
And letās not even mention EUās re-militarization and their desire to stop being dependent on US infrastructure. How many jobs for highly qualified professionals will that create?
I think this also comes down to selling shovels during a gold rush. anthropic, et al. donāt care about the stuff that comes out of their models as long as they can keep selling tokens.
Oh, absolutely. I am pretty sure every LLM company has some pretty smart folks who understand how detrimental can their products be for the society and even economics / politics at large ā but they donāt care.
āIt is difficult to get a man to understand something, when his salary depends upon his not understanding it.ā
I tend to agree completely with this and this is always a warning I issue to all beginners. If your plan is to master both the technology and the process of delivering reliable software in reasonable time, the only way to do that is by getting your hands dirty. LLMs will boost your output significantly when you are just starting out, however that is more or less the final ceiling of your performance, in 1-2 years you will be left behind in the dirt by the folk that decide to understand the problem fully before solving it.
Iāve worked some time ago (pretty much right when chatGPT was released) with a few developers that had about 2 years less experience than me and I was horrified at the end of the project. They were relying solely on prompts to do everything and while their performance was in some places better, the lack of direction and understatement of how the problem at hand can be approached made the project take longer than it would have if they were just not using the LLM. The main reason behind that is that they didnāt try to solve the problem but wanted the LLM to do that and the effect is that they introduced a lot of unnecessary external noise which made the process much longer and unnecessary complex.
Just like any other things, this is just a tool. Understanding when and where to use it is crucial, and most developers are simply horrible at assessing this. This dates from long ago and itās no different if we speak about tools that came before it.
While I completely agree that we need to put food on the table, I would also say that the only way to win this is by not playing the game. I think we overestimate how important is the skill of writing fast code is, code should be considered a liability not an asset. A well solved problem will beat by at least a factor of 10x any kind of artificial boost, be it code generation or something else. I can say that this is certainly the case when it comes to elixir, there are countless small companies that can have 3-10 engineers over the lifetime of the company and are beating the competition to a pulp, and trust me those engineers spend less than 20% of their time writing code.
There is and always will be, but not for factory workers that are interested in typing code all day long. There is an extreme shortage of folk that can solve business problems and deliver a working solution, which is the hardest skill to learn and master (and will never be replaced by a LLM, as this is the tipping point where actual intelligence is required). Iāve personally had the luck to be thrown into the fire in starting and developing some amazing and complex products and I can say that Iāve failed miserably time after time to come with good solutions, however that taught me some of the most valuable lessons that made me see the profession of a developer as something different.
I think that the start of the gaming development industry symbolizes how being capable of solving problems is more important than anything else. I would highly recommend to watch Half-Life: 25th Anniversary Documentary, there is a point in that video where everyone points that most of the folk that worked on making the game were coming from other trades, they learned and created new tools and managed to deliver an amazing game.
Oh, a lot of business people are about to crash head-first into this reality, very soon, on a much bigger scale. Just you wait some few other months. Iāll be in the corner with
I do agree but we have to bring in some nuance here: if I became a full-blown businessman that can also code⦠why do I need employers again? Iāll blow them out of the water in their own business game. I will have all their abilities and connections and can code on top of that ā and they canāt. They will have zero chance as my competition.
But thatās not what I am aiming at. I figured Iāll mostly stay in my tech area ācornerā (a fairly huge corner, though, overlapping many other ācornersā).
Itās absolutely crucial for a senior dev to be a half-businessman, half-customer-support-agent, half-salesman and a few other roles as well. When you talk with people on their own language, they warm up to you and give you more info and the whole process becomes smoother and with better outcomes for everyone.
But again⦠if I wanted to be an entrepreneur, then I would become one. I like coding. I agree we must never lose sight of the problems we are solving (and I donāt). But I still want most of my work to be the coding / programming, integrating with other systems, helping with Ops concerns etc. I have been a CTO and a VP of Eng in 3 places in total. Hated it like I hated very few other things in my life.
My 2 cents: I think thereās something to this vibe-coding There are many levels what people mean by it. And each level has its own probability of success.
Many folks understand vibe coding as - āI donāt need to know anything about programming and Iāll just one-shot my project with a single promptā. Success probability? Roughly the same as hitting production on your first mix phx.new. Even if you use something mainstream for vibe-coding, like nodejs.
Then there is - āI donāt need to know anything about programming and Iāll just iterate with AI till I get it rightā. This one is interesting. I think AI is capable to achieve something to test the idea.
Will it scale? No.
Will it be maintainable? Also no.
Will it impress investors if the idea is hot? Absolutely.
If your idea proves itself, youāll either have investments or money to hire devs to build something decent and maintainable.
I donāt know much about the market of building MVPs, I imagine it involves outsourcing to countries with cheaper labor. But itās still not cheap, especially if as a founder, you are investing your own money. Now you can do it yourself. Thatās a serious shift for MVP economics. I imagine this will absolutely disrupt the MVP-for-hire industry.
And then there are people who: āI do know programming, and want to build something either for work or for myselfā. I fall in this category.
At work I use AI for couple of things:
Help write documentation: code, tickets, internal documents.
Help write boilerplate code that is easily repeatable, based on some other parts of the existing code.
Build a throw-away prototype to demonstrate the concept.
All works pretty well.
And for my own projects Iāll add some more things:
Write HEEX markup with Tailwind. I can do it myself, but it will take me longer. Plus itās a bit boring task for me. Claude does this exceptionally well and itās super fast. The other day I asked it to fix dark mode. Worked after first try.
Generate assets like icons, logos, photos. Itās amazing how much money this saves. Itās basically free now. Used to cost me $$$ or my soul to stock photo sites, or beg my designer friends to help me.
Use AI to build an almost full featured demo. Just the frontend part of it, in order to understand myself better how it will behave. Before I needed to build it fully to only get disappointed that I donāt like it. Now I vibe with a fake frontend on v0.dev or lovable, click around, change my mind guilt-free.
I think that if you are in this camp - AI will only make you faster and better. You still need to code the parts that are important and interesting, boring repetitive parts will not slow you down, and on top of it - you will get extra things for free, like designs and validating a concept.
Another risk of going too far generating code with AI is to end up with a āsystemā that is surprising/hard to understand, and then, as a human, try to change, extend, or asses the security soundness of.
I think alot of people just use it to get boiler plates off, as of right now tokens arenāt cheap enough to iterate over and over again. I cannot recall where i read this from about how some programmers are like sculptures chiselling down imperfection with pretty much unlimited granite that you can rollback if you dont like the result
for me rn, I only use it to do frontend stuff I dont think you can go wrong with it for generating css & html, things i found really tideous to write, but i can easily read and make changes if need be
I mostly wish that that the term āvibe codingā would dieāin my ~35 years of using a computer itās the cringiest term Iāve ever heard. Iām probably just old, though. But like, I canāt be that old because I used ācringiestā⦠right? But it really gives me a visceral reaction every time I hear it and it doesnāt seem to be getting any better.
To add to whatās been said, Iām honestly not sure that inheriting a āvibe codedā codebase is going to be any worse than inheriting a prototype-turned-production application from a tech savvy entrepreneur whose sole focus was to ship Ship SHIP. So Iām not too worried about this.
I still havenāt used an agent and only just started using Copilot a few weeks ago (lol?) Itās been very good and sometimes a little bad. A lot of what @egze says resonates which are the times itās good: itās written 20-line handle_events with just a few chars input and just 10 mins ago it inferred a Calendar date format for me which was just the bees knees. The times itās bad have been what @D4no0 touched on⦠in just a couple of weeks Iāve gotten lazy and just hit āacceptā for a problem I didnāt quite understand (because it was there) only to later realize it wrote a 15 line function that could have been 3. Iām already working on training myself out of that. Iāve gone back and forth between auto-suggest v needing to manually request suggestions.
Iāve also had the experience of writing a simple algorithm-y function and asked if it could improve it and it couldnāt so I felt good, lol. I then asked it to improve a slightly more complex one and it choked. Among other similar things it said: āI removed the Enum.reverses because it doesnāt seem like they are needed, but if they are just put them backā which is pretty funny (they were most certainly needed).
EDIT: It also often writes tests pretty much exactly as I would have written them (which is just great).
Same, by the way! To the point that I have lately ā rather shamefully ā almost completely gave control to the LLM to write me the tests. Mind you I am still inspecting them closely but I find myself surprised that there are very few critical remarks I can give it. One example: needlessly using _error_code_ignored_here in an error tuple. I told it āNope, the underlying API we use is quite stable and makes it a point to never break backwards compatibility, please pattern-match directly on the raw value, it is safe for us, the test will not become brittle and weāll be stricterā which it happily obliged. But not many such occurrences. Just 2-3 for several days of work.
They can save a great deal of time, but only if used with a tight leash. You have to know what you are doing in order to give it the instructions and not let it either make guesses or get out of control. For the first part of that, I wrote about it here. In short, flip the script and have it let you know about any ambiguities. As for the control aspect, it can easily lose track of context and revert to default behavior, which is often not what you want. So, you have to constrain it. For example, you may want to add this to a prompt: āPerform analysis only. Do not make any changes yet.ā Hopefully, this can let you review and correct what it may be planning to do before it does it. Once youāre ready, add a bit more to the prompt: āProceed with just this set of changes. Do not add, change, or delete any other code.ā Further, as the set of changes grows in complexity, break it down and have it work on and track each set of changes as it goes.
Another critical aspect, keep your repo commits up to date. Update the repo at each small point of stability. Then, when the agent does lose control, it is easier to revert and start that set of changes over again. Some may recall the Dilbert cartoon about āWrite a minivanā. Sometimes, the agent may want to do this also. You may pay it to write something and, after ignoring it for a while, pay it again to fix what it just did. So, when you run agent prompt instructions, keep and eye on it and abort it if it gets off track.
I wrote about using AI/agents for greenfield work here. In this initial Elixir/Phoenix/LiveView project, more than 90% of the codebase was generated. It has since been through two AI/agent assisted refactors, one for adding full Stripe product catalog synch and another for adding user usage metrics and cost tracking. The current state of the project is a mostly working app. But, as a solo-preneur, generating code is the easy part. Not all tests can be automated. You still have have to manually test functionality to ensure it is working correctly and not just passing tests, which may or may not be correct. That human time is still one of the slow parts of any sufficiently complex project.
To the flow in the post, I have since added use of Cursor for smaller sets of refinements. Those 500 premium requests per month helps to reduce the Claude Code costs.
But the thing is, in the time it would take me to engineer the perfect prompt to hopefully get it to do what I want I would probably have already written it myself already I think
The big question is: is it cheaper and more reliable than hiring a professional (or more) to develop the product? Because at the end of the day, cutting the cost and time is just one part of the equation and not always the solution to successful products.
In the case you are describing your time and focus is tied to the development process (and subsequently the money, as you are paying for the service), it might be the case that your time is better spent on solving other problems.
That could be true. There is another benefit, though. Here is a different scenario to illustrate: When we have an issue, we may tell it to a friend and, before the friend can even reply, our act of telling it helped us figured it out. My working with ChatGPT was similar. Working through the process helped me to figure out what I wanted it to do, which was not fully fleshed out at the start. Of course, anything will take time, whether you figure lots of it out up front or as you implement. In my example, I had two refactors for things I had not yet considered until after the initial implementation.
As an example, it may cost the same to hire either of two consultants, but equal cost does not guarantee equal results.
Whoever is using the agent needs the ability to interact with it in a way to get the most out of it. Professional ability differs whether using an agent or not. This is one of the reasons I suggest the agent be used as the pair partner to the professional. Even if Iām the one making all of the changes, it can often be faster to ask the agent, āWhere do I find the code that does X?ā than me doing that research myself. This can be especially true if you are not yet familiar with the codebase. I appreciate that time savings. As a further example of that, take a look at deepwiki.com and what it has done documenting some public repos.
How time and/or money are spent is, of course, a personal choice whether for an individual or a business. My usage was two-fold. I wanted to learn more about AI, agents, and the use of them. So far, Iāve gotten a good start on that. The output of that learning was also the time savings of starting my side project. This was not something I would hire a team to create. Perhaps a startup might. But, it did help me learn how to use both AI and agents more effectively. Whether that potential improved efficiency matters is up to the person or company paying for it. My only regret so far is not making that one extra commit before an agent went off the rails. (See the āwrite a minivanā comment from before.). Overall, I like the results I have achieved so far and will continue to use it.