Engineering leads, what are you doing to stop the slop?

,

Ha, I love how this convo keeps going! Gives me a lot to procrastinate with :stuck_out_tongue:

This is what I worry about most (well, the technical thing I worry about most, this discussion hasn’t gone deep into any politics around this stuff which I think is for the better). I just published my first heavily LLM assisted library and it’s been a great experience. It is something I would have never taken on (or at least never gotten this far) if it wasn’t for LLMs, but I certainly still needed to review the code. Embarrassingly, even though I was reviewing code, I started to get sloppy as the addiction kicking and I ended up committing (and pushing) a generated Erlang file which should have not only been ignored, but contained a hardcoded local path on my machine! I know this is largely because I’m new to this and haven’t built up a library of “skills” and whatnot, but still I don’t think we are anywhere near where we can vibe production code. My problem with the whole “who cares so long as it works, people already ship horrible code all the time” is exactly that: people already shipping horrible code all the time! We already have a problem and this is making it worse.

This brings me to:

I don’t disagree that coming up with mass-produced stuff is art, but in an academic debate it is certainly much lower art than hand-crafting furniture, and the product itself is often kitsch. One of my favourite things I’ve heard certainly is: “I’m too broke to buy cheap stuff.” All mass-producing stuff really does is create a race to the bottom. If it didn’t exist, hand-crafted stuff would still be pricey but wolud be way more affordable if it was all that available and woodworkers had constant business. So, uh, yes, love your comment though that is the part it fell down a little for me (other than you are just decribing the reality we’re in, so can’t fault you there).

Sorry to keep picking on you @egeersoz but you are obviously the biggest booster here! I do have a problem with these examples because a) as I’ve already pointed out, DHH says he still hand-codes, b) Chris McCord has that small but ever-potent quip: “for better or for worse,” c) Simon Willison is paid to boost AI, and d) (and of course I’m trolling a bit here) Ryan Dahl not only brought us JavaScript on the server but was proud of it! The man is obviously a sociopath that no one should listen to :wink:

I do realize you are responding to “great engineers who use it” but this is the second time you’ve shared this list and I just wanted to get that off my chest as it’s more proof of “LLM-use is inevitable” over “LLMs are a good idea.”

1 Like

I replied to this partially but want to elaborate on the larger point.

The argument you’re making is self-defeating in a specific way: you use introspection skepticism as the weapon (humans rationalize, so self-reports about AI helping are suspect) and then your evidence is entirely your own introspection, i.e. you cannot think of a great engineer using Claude who impresses you. If the mechanism is real then it applies just as much to the skeptic watching their craft get automated as it does to the enthusiast. You don’t get to deploy it in one direction.

Second, you characterize the freed capacity as fake work. You call it “galaxy brain concepts” and “drawing boxes in Figma”. But the work hasn’t vanished, it has simply moved. Someone who uses Claude well does their thinking in the planning conversations rather than the diff reviews. The mental model of the system gets built up-front by directing rather than back-loaded by inspecting. It is not much different than reading an engineering design document written by a coworker and doing review/feedback cycles until your own understanding of what will get built is rock solid. That is how system knowledge gets spread in every team I’ve worked in and it works very well so long as everyone is doing their due diligence (which one absolutely must with AI in order to get good results).

Lastly, I will say this: the meta-pattern of “real programmers don’t need X” has been deployed at every stage of our field’s history, usually by senior practitioners. It has been deployed against compilers, garbage collection, IDEs, autocomplete, Stack Overflow, type systems, auto-formatters. In every wave, the skeptics were partly right about skill atrophy but decisively wrong about the net effect. Incidentally, this is why Linus Torvalds recently likened the productivity benefits of AI to those of compilers: he himself has been using AI more and more and says he wished people treated it like any other tool.

4 Likes

DHH still hand-codes but it is less and less. This is according to him. He actually has a pretty great interview that you should watch, if you’re curious about the details: https://youtu.be/JiWgKRgdgpI?si=xSoMivFBZOVX86pH&t=1912

I do not think Simon gets paid to boost AI. Would love to see some evidence of that, if you know otherwise. He actually recently said he makes no money from it, which is why he added a donation/subscription feature to his blog, so that he can justify the time spent, and also that he needs to start charging consulting fees because everyone keeps pestering him.

I don’t have objections to your other points, especially about Dahl. I dislike Node.js myself. But it does power millions of websites. I may be off by an order of magnitude.

To be perfectly honest, the main reason I responded was that I wanted to make that Ryan Dahl joke :stuck_out_tongue: And is Simon not paid to write his blog? It’s highly possibly I’m just taking to heart hearsay that I hoped to be true :grimacing: But… but… but… Steve Yegge is paid by AI! So there! lol.

Watching! And thanks for timestamping it!

I knew that writing art instead of skill would provoke someone a bit. :smiley:

Making something really nice within the additional limits created by adaptation to specific mass production processes, logistics and economic minimalism is technically harder than making a one-off with no or little concern about any of that. Thus, for me personally anyone who can do that well is very skilled technically, and if the product is nice and functional also a good designer. From there whether it is art or not should likely be evaluated by the end product itself, and not how it was made? Being a one off is not a requirement for a work to be art.

Judging all mass produced as bad and all hand crafted as good is too coarse a categorization for me. I am living proof as my hand crafted stuff certainly isn’t art to anyone but my mother (at best). There are those that maintains a definition that art has to be useless, and by that standard anything with a functional purpose would fail regardless of how it was made.

But to return to something more in line with the thread itself: Making skills I have sometimes wished for calibrated example applications. For Elixir that could for instance be some applications covering key areas with very well defined functional requirements and strictly idiomatic best practices architecture and code. That would make it easier to optimize skills by having a clear functionality to prompt for and a clear architecture and code goal to achieve and measure success against. Multiply that over a few key areas and any skill successfully recreating all would be elusively perfect. Any close to perfection would likely still be very good - and confirmed to be so. (Edit: Oh, and that would help fight the slop! I should make a t-shirt like that. “Fight the slop!”).

This triggers me more than anything (in the sense that I recognize it’s happening). Developers churing out bad code, cutting corners, and focusing on “making it work” are given tools to do more damage faster. Even worse: they are praised for their massive output, while their colleagues can’t keep up with the reviews (“But everything works, right? I even prompted the thing to generate tests!”).

We’re in an era where a lot of itches are being scratched. Many libraries are being published, often carried by solo developers finally able to act upon their “someday-maybe” list. I genuinely think it’s great, and a lot of creativity is unlocked. Heck, my father in law is killing it with Claude, at the age of almost 80, for some hobby projects. Dreams are coming true.

But those are mostly silo’s. Single devs, acting on their ideas. I would be hesitant to draw in a dependency on a vibe coded library, when it was created over a few weekends, without much back and forth with a community that pushes back and gives feedback.
I wildly agree with Bob Nystrom here, that “spending time” is a great way to create value. There is a sweetspot to be had here. But LLMs are not pushing us towards that sweetspot (I want it now!).

I have yet to see how LLMs foster more communication and collaboration. I see the opposite happening: “let me code that for you”. The hero-type programmer, daring to take on even more risks, and larger endeavors. Alone that is, without much consultation. Asking their LLM, rather than their users, about UX, DX or whatnot. Unchecked plausible assumptions all over the place. An LLM as an exoskeleton.

I hope this will change too, when all the itches are scratched. That we will build the right tools, which inevitably requires us to work together again.

8 Likes

Ah ya, I am very easily baited :sweat_smile:

Yes this is fair. I do own some very high quality and beautiful items that were mass-produced, such as kitchen knives, that would have taken a lot of skill to design.

I feel dumb here: are you talking about LLM “skills”? I’m still new, lol.

I was going to talk a bit about the hastiness earlier, but I still feel very new as an LLM-convert so I didn’t, but ya, as I said above, I’m on the verge of announcing a library that is heavily LLM assisted. There are a few reasons I’m feeling OK about it, but mainly it’s that it’s essentially a port of PhoenixTest for Hologram. So a) it’s already a mostly-understood API and b) it only lives as a test dependency and hopefully, if you were going to use this, you also have true e2e tests. In any event, not having something like this was hurting my desire to keep going with Hologram because I’m a BDD’r and in LiveView I almost always start with a LiveView test and ya, it’s undeniable that being able to make this happen in days instead of weeks (or months) is a really good feeling.

ANYWAY, I’m also working on a form abstraction library and this one I am not leaning on the LLM much and asking questions to (what I hope are) actual people on Discord :sweat_smile: because yes: I think it’s important to get early community feedback and collaborate with humans, as you said. Otherwise, I’m actually losing my point here a bit because since I started typing this my dog started staring at me and is now intermittently barking at me and I should probably feed him and finish this later buuuuut I think I’m just going to hit send :sweat_smile:

2 Likes

Yes.

Skills can be considered to text added to the context of the AI when asked for or if certain keywords or file extensions trigger loading it. The text itself can be made to give the AI actionable guidance, reference and examples of how to do things. It is not deterministic though as the AI can vary interpretation, ignore or forget parts just like any other text given to it. It still helps to direct the choices the AI make.

One of the things I do which is kind of different from most is making and using very fat skills. Most skills will be tiny skeletons in comparison. My logic about that is two fold: 1. More guidance and examples tend to give better results. 2. By providing validated reference code examples and guidance I have more control of what sources the AI takes inspiration from. I prefer that over the AI running off to some random web page or repo picking up a bit here and there, or using just its own naked training which may or may not be close to the mark. The latter will still heavily be there of course. (The downside of fat skills is less sharp focus overall, and more context use. But with Claude have 1 million tokens that became less of an issue).

For some skills I’m at the third iteration now. It turns out that making skills is a skill in itself, and I’m gradually improving and getting ever better results. Also every time I hit a snag I try to make the skill so that can be avoided in the future. Small granular improvements like that add up over time. (The current Elixir skill I have linked elsewhere is still on iteration 2 and will be replaced eventually).

Anyway, if there were calibrated true targets for what a skill, other means or naked AI should ideally achieve when given a specific requirement or prompt that would provide a definite target. Which would make it possible to complete a proper feedback loop and thus make it easier to actually achieve that level of output.

2 Likes

I gave the whole thread another read and after some reflexion, I finally came up with the following analogy.

COBOL, 1959. Any warehouse worker can now write code.
SQL, 1986. Any accountant can now write code.
AI, 2025. Any idiot can now write code.

It might sound a bit rude, but it’s essential anyway. Great developer is to be more performant/productive with COBOL instead of machine codes and with SQL instead of direct DB calls, and with AI assistant instead of a code completion in LSP.

A mediocre warehouse worker, a brilliant accountant, and any other person on that matter still cannot produce a robust comprehensive code.

1 Like

You forgot:

  • C++ 1983. But only autistic masochists actually do write code
1 Like

Well, my mother-in-law was an accountant and worked with RPG II/III and then COBOL on midrange IBMs, which was quite standard at the time and seen as accountant work, not programming work.

It was common for french small/medium businesses to have their accounting and payroll tooling done bespoke in-house by the accountants before on-shelf software started to be more common. They saw “actual” computer engineers as another fully separate trade who developed the systems and infra this ran on.

Sure, they didn’t develop robust comprehensive code, but filled a bit of the software needs of those companies.

But this wasn’t seen as software development. Today vibe-shipping a POC without knowing the underlying programming isn’t fully seen as software development either though, but that doesn’t stop the economic narrative.

Heck, they even already gave out mugs at the time :stuck_out_tongue: (fully anachronic from the above story, found in a flea market last year)

6 Likes

I will argue that a great programmer can be a mediocre warehouse worker and yet be able to produce robust comprehensive code.

Here is where I think we will be within 3 to 10 years time, lets say AI 2030:

Human generated code will be considered risky and error prone, while AI generated code will be low cost and perfectly executed. We will likely have programming languages adopting to and optimized for AI use, or maybe just for AI use period. To me Elixir seems to be in a good position for this.

The art of the programmer will become to understand problem domains, the possibilities, limits, costs and risks with different solutions, and within that framework make a good solutions, encode high level specifications, adherence to regulations and standards, safety, and so on. I do not believe everyone and anyone suddenly will become great programmers just because the implementation details become more or fully automated.

Knowing what to solve and how to best solve it is more than the code. The purpose is the same as it ever was. The purpose was never to make code for code’s own sake.

5 Likes

I think I’ve identified the core reason I instinctively dislike LLMs.

I strongly value competence. This applies institutionally as well as societally, not just individually. In a competent society, things just work.

You can take a hot shower because there is clean water and flammable gas piped into your home, a device on premise that burns the gas in a safe, controlled way to heat the water, and a sewerage system to carry the dirty water away to be disposed of cleanly. Not to mention the electricity being used to light your home and power the water heater, and the technology that went into building the home in the first place.

Almost none of this you could achieve alone. But you enjoy the benefits of a shared competence. There is an institutional knowledge at work that ensures you can take the benefits of our technological inventions for granted as part of your daily life.

LLMs replace this hard-earned institutional competence with statistical guesses - a literally incompetent imitation.

Of course, nobody can use an LLM to build a hot shower. But what if the people at the water company stopped knowing authoritatively how to purify water, and instead relied on an LLM? What if they stopped caring about fluid mechanics to get the water safely pumped to your house? What if the producers of the water heater stopped caring about controlled combustion, the sewage company about not dumping sewage into streets and rivers, the electricity company about electrical fundamentals or the builders of your home about solid architectural principles and following through with proven construction practices?

These are the institutional competencies we rely on. LLMs incite individuals to turn not to the institutional knowledge that we have collectively built, but instead to settle for a statistical approximation, eroding the forces that built the shoulders we stand on.

(also blogged a few hours ago)

13 Likes

Good points.

A nuance I think is what the responsible gas company, electricity company and water company specifically hire people they know are competent with formal education and or the certifications to prove it. These days however it seems AI gets “hired” to all kinds of jobs without proof of being competent at all aspects of the job. That is a real problem.

I imagine companies in the businesses above are responsible though for what they do regardless of how they do it. So I would assume those particular ones would be quite hesitant to hire someone or something that does great 95% of the time but 5% of the time burn the house down.

I do expect that ratio to improve beyond humans though. And to the extent AI is used in the mentioned applications I do know AI is used by some to detect and warn about failures before they actually happen. (Based on establishment and continuous observations of normal operations and warnings of changes).

I would not use AI coding for any safety critical or life support application. But for the most part I’m not there, and I like to think those who are do not use AI or have perfect reliable testing. My work would be more in the terms of hanging up a picture, or changing a light bulb.

Then I think we have the angle of LLMs being used by competent people as a multiplier rather than a crutch for incompetent people. I do share your worry about AI just being put to use uncritically in sectors and applications it is not suitable or qualified for. The hype is too real.

1 Like

At last, it’s my time to shine!

3 Likes

I think this argument rests on a dichotomy that does not hold up to critical scrutiny. On the one side, you have institutional competence, which is rigorous and authoritative. On the other, you have LLMs, which are “statistical guesses” and “incompetent imitators.”

But what is institutional competence, if not consolidated statistical knowledge? Building codes come from observed failure patterns, updated after every collapse. Water treatment standards come from epidemiological data on outbreaks. Medical practice comes from clinical trials, which are literally statistical inference at industrial scale. The shoulders we stand on today are built from statistically-validated patterns, formalized into codes, reviewed by experts and enforced by liability. Distilling LLMs down to “statistical guesses” to contrast them with institutional knowledge treats institutional knowledge as something it isn’t. Because at the end of the day, it is statistics all the way down!

In reality, the difference isn’t the statistical nature of knowledge, but the verification layer around it. For us, this layer includes tests, type systems, linters, code review, CI pipelines, staging environments, error monitoring and post-incident review. These are our building codes. And if anything, the AI era has raised the bar on all of them. Test coverage is no longer optional. Behavioral validation requires significant expansion. Code review hasn’t disappeared, it shifted from “is this line correct” to “does this actually solve the problem”. These are the types of institutional responses to a new tool: absorb it, strengthen the verification layer around it, and end up more robust than before. This is precisely the opposite of erosion.

This actually matches your own scenario, if you look closely. You ask: what would happen if the water company replaced purification expertise with an LLM? But no water treatment engineer is doing that, are they? Professional LLM use in engineering involves drafting, exploring, checking against codes and calculations, routing through existing review and liability structure. The LLM is an input into the verification layer, not a replacement for it. Your scenario describes incompetence using LLMs, not LLMs causing incompetence. And that failure mode predates LLMs.

Ultimately, what you are saying is a new version of the pocket calculator argument. And the encyclopedia argument. And the Wikipedia argument. And the Stack Overflow argument. Each new tool, each new layer of abstraction, got the same treatment: people said it would erode institutional competence by letting practitioners bypass real understanding. In each case, the critics were partially correct about skill atrophy (it is inarguable that we lose skills that we don’t use frequently), but decisively wrong about the net institutional effect. Bridges today are more stable than ever, medicine more effective, drinking water much cleaner, and airplanes dramatically safer. The institutional knowledge didn’t erode at all. Rather, it consolidated into codes, review systems and testing/verification regimes on top of the new layer. It became more durable.

There is a real failure mode in the neighborhood of what you’re describing, which is practitioners using LLMs without a verification layer and treating output as authoritative. This is worth pointing out and worth guarding against. But the remedy is maintaining and strengthening the verification layer, which competent institutions have always done for every new tool and knowledge source.

5 Likes

To add to the above, LLMs are often shunned because they are non-deterministic. Guess what though: so are humans. Carpenters mismeasure, electricians skip steps, plumbers forget clearances or might not tighten rings fully. The building codes, the inspections, the permits, the trade licenses, the liability… that’s the deterministic scaffolding society has built on top of unreliable practitioners. Software is no different, and the framework for dealing with LLM non-determinism already exists. It’s just utilized insufficiently and unevenly - which is becoming apparent now from the wildly different results people report.

1 Like

What reconciles me to reality, is that LLM used by a person lacking this earned institutional competence, produces crap.

I mean, this institutional competence is still a must. And it will be that way forever (with LLMs.) Because LLM cannot invent anything by design. They could replicate. And I am a huge fan of DRY principle. That “R” I feel like delegating.

The implementation pillars (which are indeed the institutional competence) are still on us. The epoch of wow-indie-ceos will end soon, once they all have all the drawbacks of releasing the non-verified code faced.

2 Likes

Agree 100%. LLM’s are faulty just like us and the code we write.