Any first-hand experience with Claude model getting "nerfed" down after using it for a while?

Been using Claude for over a week now (Opus 4.6 then 4.7, max subscription). Honestly, can’t hide my joy, at some points feeling even ashamed of my past stubborn skepticism. It’s amazing how much (more) I managed to do over this last week. The productivity boost is comparable (if not bigger) than when I switched from OOP to Elixir 7+ years ago.

Just when I thought I figured it all out (how to get the most out of it - btw, I’ve managed to achieve 0 slop for what I’ve been using it for currently), yesterday I got cold showered when I called to brag to an old friend of mine who runs a small dev agency and who I knew has been using Claude for a while.

“Enjoy it while it lasts” he laughed cynically. Told me the (consistent) excellent performance I’ve been experiencing would cost way more than a max subscription, and said he felt it’s been a bait and switch - that over the 8 months he’s been using he’s experienced very serious and frequent drops in Claude’s performance to being downright dumb and messing things up (which is the exact opposite of my experience so far).

A couple of weeks ago I saw a Zerohedge retweet of an engineer from AMD ranting about Anthropic having been nerfing Claude (deliberately downgrading its performance).

Since all this can be very subjective, I need to ask if someone here had this kind of in-person experience or maybe even a more in-depth knowledge on the matter?

4 Likes

It usually depends on how it is used. Plain direct usage of prompting (even through Plan mode) most of the time result if crap output. Could be subjective of course, but it writes duplicated code, do not reuse already written functions and module, ignore codebase standards (even if they are directly written to local CLAUDE.md).

Sometimes psychosis starts out of nowhere, and it’s starting outputting pure crap. Or it states he did something, but never actually wrote that code. It can also easily wipe hours of work but just resetting git state. Things happens, to what Claude usually say - “I as so sorry!“.

Only way. I was able to keep it in line is through having very strict orchestration framework with a lot of hooks. So, in short - your friend is correct.

Besides those “features“, service has terrible uptime - it is down very often. It’s not a feeling - their status page all red.

3 Likes

Thanks for the feedback. But it’s weird I haven’t experienced any this except for the suboptimal code generation (which I’m accustomed to regarding LLMs in general, but I’ve learned how to deal with it and get from it what I need and how I need it).

TBH, my only worry here is this being some kind of actually deliberate policy. That’s the only thing I’d actually hate. If it’s a result of peak demand or whatever technical reason, then it’s subject for improvement and will most likely go away, but if it’s the result of a corporate policy, then it’s too bad.

Opus 4.5 was a bit of a tipping point for me, and I’ve been an heavy user now for months. I haven’t experienced degrading overall, and for the large majority of the time it is fine, but there are times when I get the fruitcake Claude. That will typically happen after getting a new one after compaction, but it has also happened a few times after very long session. There was a period earlier this year where that happened more often, but these days not so much.

I have gotten better at spotting the confused crazy talk early on, and I just compact that Claude away and usually get a good Claude one again afterwards.

I do git commits often as Claude will not always have a way to undo code changes that don’t work out. I also do additional more comprehensive backups for major milestones. That didn’t change because of Claude, but they have been used a few times when Claude have accidentely overwritten or deleted data source files. Frequent backups do so much better than “I did a horrible mistake. I’m so sorry”.

To be fair sometimes I’ve been the idiot and implicitly assumed Claude has a level of common sense. There is none. I once had ssh into a more powerful computer, and once the heavy processing was done, I asked Claude to clean up and remove all files no longer needed for the project. That did not work out well.

Anyway, I can’t say have experienced any systematic degradation. Rather the opposite as Opus 4.6 and 4.7 seem like improvements.

2 Likes

Thanks!

What exactly do you mean by this? (compacting that Claude away)

Frequent backups of what? Your repo is already versioned (and hopefully pushed to remote).

/compact or just /clear

I have two projects which have been going on for months. There are many huge data source files which are git ignored due to size.

Besides, confused Claude version have at times suggested git actions that could mess up that backup so a separate one makes me sleep much better.

1 Like

It does feel like deliberate policy. I’ve been experimenting heavily on max plan, mostly doing R&D besides real work. As soon as new model appears - it’s fast, smart, and you feel like a real change. After a week feeling is gone. Everything is slow again. Opus 4.7 started to often hang in the middle of the work (friends report similar behavior) - just stops at some point doing nothing. The worst situation when it hangs within subagents - it’s not stopping subagents, and it feels like it’s just working for really long time.

Without harness or orchestration frameworks like superpowers, plain Claude feels really silly. It’s still way better than let’s say Mistral’s devstral-2, but with Kimi 2.5 I get very similar level of quality with 0.25 of the price (if used through Factory Droid subscription for example).

To sum up, I think it’s taking at least few months to start feeling the pain and understand the AI tax. You lose knowledge of codebase, and often you just blindly trust it. Then you check some parts of the code that AI covered by tests and that work in production “correctly“ just to discover total mess, that will get you into cold sweat.

1 Like

Yes.

On forums such as the OpenAI Community Forum, there are many threads describing what you are calling “nerfed” behavior.

From longer-term observation across multiple platforms and vendors (including OpenAI and Anthropic), this pattern is not isolated to a single model or provider. Users often report cycles where a model initially performs very well, then appears to degrade.

A few factors—both user-side and vendor-side—can explain much of this:

User-side factors

  • Long conversations and context compaction
    As sessions grow, systems may summarize or compress earlier context. This can drop details that were implicitly guiding good responses.
    Mitigation: periodically start a new session and carry forward only the essential state (a “continuation prompt”).
  • Prompt drift vs. model updates
    When new model versions are released, guidance in model cards or documentation often changes. Prompts that previously worked well may become less effective.
    Mitigation: periodically revise prompts to align with current recommendations.

Vendor-side factors

  • Model updates and tuning changes
    Providers do update models over time (e.g., safety tuning, instruction-following behavior, latency/cost optimizations). These changes can alter output style or reliability.
  • System prompt and policy adjustments
    Changes to system-level instructions or safety layers can have noticeable downstream effects on responses.

I could likely spend a week covering this in depth, but in practice it comes down to understanding how LLMs operate, reviewing model documentation, and gaining experience through use.


Note: I did use ChatGPT to polish the reply but the starting reply was created by me then polished with the help of ChatGPT.


https://openai.com/index/gpt-5-system-card/


Best practices for using Claude Opus 4.7 with Claude Code

Prompting best practices

Using Claude Code: session management and 1M context

Note: I do not actively use the 1M context and it eats tokens faster.

1 Like

I doubt there’s any conspiracy here. Probably just operational difficulties.

But FWIW, there are other ways to use these models. For example, we use Opus via Opencode talking to Amazon Bedrock, so it’s running on Amazon’s infrastructure, not Anthropic’s. I haven’t noticed the kinds of issues people talk about with Claude Code. And as a bonus, in theory we could switch to another vendor’s model (although so far Opus has been great, and I like Anthropic more than I like its competitors).

1 Like

Really, I can not edit my own post after a few hours. :frowning_with_open_mouth:

image


An update on recent Claude Code quality reports

2 Likes

Btw, this morning it got a bit lazy/superficial. Asked it about the slip-ups in execution and it admitted being lazy. Just had it add “Don’t ever EVER be lazy!” to its project memory at the very top. It also added the 3 instances of the morning laziness as reminders/arguments on its own initiative :slight_smile:

1 Like

Something I faced often was comments like “Hmmm… I see this #TODO in the file but not related to my changes. The next operator will fix it.” which is quite infuriating behaviour, since I left the #TODO for my next passage on this file, be it manually or assisted.

When I added “Think like a boy scout, leave the place better than you encountered it” in CLAUDE.md, it started saying “Hmmm… I see this #TODO in the file but not related to my changes. – wait : boy scout rule, I must fix it” which I must admit, I have not decided if that was more or less irritating than the previous version, but it fixed this refusal to handle low hanging fruit, like restructuring types, doing a small refactor…

Generally I see a lot of behaviours that I do not tolerate in myself, and being “augmented” this way is a bit tiring.

3 Likes

Be careful with negative logic, e.g. the “not” that is part of “don’t”.

The early models such as GPT-2 and GPT-3 had no concept of negative logic and would often ignore the word “not”, thus

“Don’t ever EVER be lazy!”
becomes
“Do ever ever be lazy!”

An affirmative way to note this is

“Always be diligent.”


Also check out the Claude Status page and subscribe to get updates, as these happen almost daily.

via email just now.

Claude Incident - Elevated errors on Claude Opus 4.7 - 24 April 2026

2 Likes

Thanks! Subscribed.

1 Like

In my experience, wanting the model to fix everything it finds is not a good idea.

First the model will lose focus, trying to do too much at the same time, leading to worse quality of output. Secondly because if the model reports to you or has questions for you, it will start to talk about multiple topics, leading to conversation where you talk about too many different points each round.

In general I find it is best to start a session, tackle some implementation, start a new session, ask for TODO fixing (only TODOs in the same module or otherwise accross codebase but related to a single concern). You can use an agent to scan the codebase for TODO (or ask it to use credo for that), then group the todos, categorize and handoff a fixing plan for the next session on a single concert, and start a new session.

(now of course sometimes the code is small, the fixes are easy so fixing todos along the way can be ok)

6 Likes

Yeah, I agree that keeping focus is important. But if the model surfaces it, and it IS related to the thing it was working on, only not to its specific changes, the default attitude of surfacing it to say “not my problem” is strange to me…

1 Like

Fully agree.

I start new sessions at the drop of a hat.

For my current project going on a few weeks I would not be surprised if the session count is around 500.


One thing that some users do is to create a daemon to record changes to .claude directories (ref) and other directories used by Claude for latter analysis.

2 Likes

It is a hard pattern to break.

Alternatives that work for me

  • Keep a text document open with notes of things to do in the future.
  • Start in planning mode and refine the plan so that the AI stays on track.
  • Update Claude.md or a memory file with these todos.
  • Skills are useful for this at times.

Side question: Is the Discourse AI enabled on this forum for reviewing to a reply before posting? Is it limited based on group or trust level?

image

2 Likes

“not my problem” is also my own behaviour to be honest. When I open a codebase, all tests pass, the CI passes, production is stable, etc. So any todo found in here is fine and “if it’s not broken do not fix it”.

Even before AI I was tackling todos in batches. It’s just regular tech debt reducing sessions in the end.

2 Likes

@lud

One thing that was not asked by @DaAnalyst specifically but is of value for the topic but use to be hard to do and now much easier with AI help is to update test coverage to 100% on the core code, not 98%, not 99%. The reason being that if the AI has access to the unit test, it can use that as the source of truth, and also learn more about how the code works and is expected to work rather than spending hours of reasoning on its own coming to false beliefs.

This reasoning is very much in line with the evidence reply from another topic.

1 Like