Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.
I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.
I think so, but more than that, the performance of those tools seems to be terribly degrading when they keep saying they have created some crap like AGI which we know is a lie.
And to me, this lie is mostly a fight to see who bites the biggest chunk of the war death machine.
- Banning OpenClaw users (within their rights, of course, but bad optics)
- Banning 3rd party harnesses in general (ditto)
(claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))
- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.
- Noticed a very sharp drop in response length in the Claude app. Asked Claude about it and it mentioned several things in the system prompt related to reduced reasoning effort, keeping responses as brief as possible, etc.
It's all circumstantial but everything points towards "desperately trying to cut costs".
I love Claude and I won't be switching any time soon (though with the usage limits I'm increasingly using Codex for coding), but it's getting hard to recommend it to friends lately. I told a friend "it was the best option, until about two weeks ago..." Now it's up in the air.
> It's all circumstantial but everything points towards "desperately trying to cut costs".
I have been wondering if it's more geared at reducing resource usage, given that at the moment there's a known constraint on AI datacenter expansion capability. Perhaps they are struggling to meet demand?
I wish they would just rip the bandaid to stop everybody's entitled whining.
"We're sorry, what we were able to give you for $100/mo before now needs to be $200/mo (or more). We miscalculated/we were too generous/gave too much away for too little. It's a new technology, we are seeing a ton of demand, we are trying to run a business, hope you understand. If you don't want it, don't pay for it."
> (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
How often? Realistically, if you invoke it occasionally, for what's clearly an amount that's "reasonable personal use", then no you don't get nuked.
It’s the same problem people have with Google. If they ban you for some AI hallucinated reason you have no recourse other than going viral on Hacker News.
They also screwed up the API token detection and also blocked a bunch of 1st party tool users for ~24h.
Support consisted of AI bots saying you did something stupid, you did something wrong, you were abusing the system, followed by (only when I asked for it explicitly) claiming to file a ticket with a human who will contact you later (and it either didn't happen or their ticket system is /dev/null).
(By the way this is the 2nd time I've been "please hold" gaslit by support LLMs this exact same way, the other being with Square)
claude -p not working would be instant unsubscribe downgrade from Max to Pro and further drive my use of codex. I use both but overall have noticed I reach for Claude less than codex lately because claude keeps getting slower and slower (I have not noticed a drop off in quality, but I use it less and less so maybe I'm not in a good position to notice).
Generally I find codex and claude make a good team. I'm not a heavy user, but I am currently Claude Max 5x and ChatGPT Plus. Now that OpenAI has a $100 offering and I am finding myself using Claude less, I am considering switching to Claude Pro and ChatGPT Pro x5. The work hours restriction on Claude Max x5 really pisses me off.
I am not a heavy user. Historically I only break over 50% weekly one week a month and average about 30-40% of Max x5 over the entire month. I went Max because of the weekly limits and to access the better models and because I felt I was getting value. I need an occasional burst of usage, not 24/7 slow compute. But even for pay-as-you-go burst usage Anthropic's API prices are insane vs Max.
I have yet to ever hit a limit on codex so it's not on my mind. And lately it seems like Claude is likely to be having a service interruption anyway. A big part of subscribing to Claude Max was to get away from how the usage limits on Pro were causing me to architect my life around 5hr windows. And now Anthropic has brought that all back with this don't use it before 2pm bullshit. I want things ready to go when the muses strike. I'm honestly questioning whether Anthropic wants anyone who isn't employed as a software engineer to use their kit.
Anyway for the last month or so codex "just works" and Claude has been an invitation for annoyances. There was a time when codex was quite a bit behind claude-code. They have been roughly equal (different strength and weaknesses) since at least February (for me).
> (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.
In this case, they handled things pretty well. You can still use openclaw etc with your regular Anthropic subscription, it will just count towards your extra credits / usage which you can buy for a 30% discount compared to API pricing. And they gave everyone one month’s value in credits.
I don’t think they could have done that much better I’d say.
There is very poor clarity about what is and isn't allowed with the Claude SDK/claude -p. Are we allowed to use it to automate stuff? What kind of tasks is it permitted to be used for? What if you call your script 'OrangeClaw' and release that on GitHub? What if your script gets super popular, does it suddenly become against TOS?
This is exactly my point. At what point does it become a ToS violation? Right now it's a huge grey area and the idea of getting my account banned because I crossed an invisible line with zero recourse other than to switch providers is... frustrating.
It's pretty easy to read between the lines tbh. Personal, non-automated use is fine. Using it as a means to automate depleting your 5-hour limit 24/7 ("leftover usage") is not fine. They don't want to put in in the ToS because it's almost impossible because writing what I just said will still have people going "well what's automated, where's the exact line!" when it's all pretty clear what the intended use case here is. The Anthropic peeps have said about as much.
I get that the traditional dev is allergic to the concept of reading between the lines and demands everything to be spelled out explicitly, but maybe you should just see it as something to learn because it's an incredibly useful life skill.
That "non-automated" part is where I feel like there is a lack of clarity. They even have some stuff in to allow for scheduling in Claude Code. Seems similar to a cron but "non-automated" would rule out using a cron (right?). I'd love to feel comfortable setting up daily/hourly tasks for Claude Code but that feels iffy. Like I said, I don't think the line is clear.
The lack of clarity doesn't matter because they obviously can't tell if you ran a claude -p a few times today with usual prompts or whether your cron job did. It's impossible for them to reliably tell.
It can tell if your cron is running them every 10 minutes 24/7, because basic biology rules out you doing that for more than a day or so.
When you're using the SDK, yes it can. Example: I used the Python SDK to translate a bunch of source code recently. I spawned a subagent for each module that needed translating and left it to run for a few hours with a parallelism limit of 5. It blasted through the 5 hour usage and dug into extra usage credits.
I have zero assurances that the above can't result in a ban. The usage pattern is not distinct from OpenClaw.
Wait, this is news to me. I thought 3rd party use of the sub was unequivocally prohibited?
If I'm understanding you correctly: they changed that policy, you can now use 3rd party software unofficially with the undocumented Claude Code endpoint, and their servers auto-detect this and charge you extra for it?
EDIT: Yeah, something like that?
> Starting April 4 at 12pm PT / 8pm BST, you’ll no longer be able to use your Claude subscription limits for third-party harnesses including OpenClaw. Instead, they’ll require extra usage.
This seems to mean that unauthorized usage of the sub endpoint is tolerated now (and billed as though it were the regular API). And possibly affects claude -p, though I don't know yet.
There’s the argument that Anthropic has built Claude Code to use the models efficiently, which the subscription pricing is based on.
Maybe there’s some truth to that, but then why haven’t OpenAI made the same move? I believe the main reason is platform control. Anthropic can’t survive as a pipeline for tokens, they need to build and control a platform, which means aggressively locking out everybody else building a platform.
Alternatively products like openclaw have an outsized impact on Anthropic's infrastructure for essentially no benefit to them. Especially when you're taking advantage of the $200 plan.
OpenAI has never shyed away from burning mountains of cash to try and capture a little more market share. They paid a billion dollars for a vibe coded mess just for the opportunity to associate themselves with the hype.
No, I'm paying $200 a month for a premium product that I expect premium service for. It's the single most expensive IT expense I have. Taking advantage my foot.
Claude Code was the best harness from roughly around release to January this year. Ever since then, it's become more and more bloated with more and more stuff and seemingly no coherent plan or vision to it all other than "let's see what else that sounds cool we can cram in there."
To be clear they weren’t banned from Claude usage, they were required to use the API and API rates rather than Claude Max tokens.
Claude code uses a bunch if best practices to maximize cache hit rate. Third party harnesses are hit or miss, so often use a lot more tokens for the same task.
I'm watching a conference talk right now from 2 weeks ago: "I Hated Every Coding Agent So I Built My Own - Mario Zechner (Pi)", and in the middle he directly references this.
He demonstrates in the code that OpenCode aggressively trims context, by compacting on every turn, and pruning all tool calls from the context that occurred more than 40,000 tokens ago. Seems like it could be a good strategy to squeeze more out of the context window - but by editing the oldest context, it breaks the prompt cache for the entire conversation. There is effectively no caching happening at all.
Yep, that's the reason for the new Extra Credit feature in Claude Code. Some people were wiring up "Claude -p" with OpenClaw, so now Anthropic detects if the system prompt contains the phrase OpenClaw, and bills from Extra Credit if that happens:
This is a funny cat and mouse game. They offer a built in loop command.
Just tmux and use that.
Soon if they drop -p people will just vibe code in 5 minutes a way to type inside it remotely similar to their own built in remote access tool. Seems like a losing game from anthropics side
A month ago the company I work at with over 400 engineers decided to cancel all IDE subscriptions (Visual Studio, JetBrains, Windsurf, etc.) and move everyone over to Claude Code as a "cost-saving measure" (along with firing a bunch of test engineers). There was no migration plan - the EVP of Technology just gave a demo showing 2 greenfield projects he'd built with Claude Opus over a weekend and told everyone to copy how he worked. A week later the EVP had to send out an email telling people to stop using Opus because they were burning through too many tokens.
Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.
Pretty bad decision on his part. I've been telling other engineers within my company who felt threatened by AI that this would happen. That prices would rise and the marginal cost for changes to big codebases would start to exceed the cost of an engineer's salary. API credits are expensive, especially for huge contexts, and sometimes the model will use $200 in credits trying to solve a problem that could be fixed in an hour by a good engineer with enough context.
It kind of reminds me of the joke where a plumber charges $500 for a 5 minute visit. When the client complains the plumber says it's $50 for labor and $450 for knowing how to fix the problem.
Equal sounds like a terrible argument given all the other problems with replacing engineering thought with ai. I don't know where the line is but I expect it's far beyond equal AND there needs to be a level of "this can debug effectively in production" before that makes any sense for a real business case.
A good lesson for all - I always really liked the Picasso version:
In a bustling restaurant, an excited patron recognized the famous artist Picasso dining alone. Seizing the moment, the patron approached Picasso with a simple request. With a plain napkin and a big smile, he asked the artist for a drawing. He promised payment for his troubles. Picasso, ever the creator, didn’t hesitate. From his pocket, he produced a charcoal pencil and he brought to life a stunning sketch of a goat on the napkin—a clear mark of his unique style. Proudly, he presented it to the patron.
The artwork mesmerized the patron, who reached out to take it, only to be stopped by Picasso’s firm hand. “That will be $100,000,” Picasso declared.
Astonished, the patron balked at the sum. “But it took you just a few seconds to draw this!”
With a calm demeanor, Picasso took back the napkin, crumpled it, and tucked it away into his pocket, replying, “No, it has taken me a lifetime.”
I can’t believe how many small to mid size companies are being destroyed by bad decisions like this.
A friend’s company fired all EMs and have engineers reporting to product managers. They aren’t allowed to do refactors because the CTO believes the AI doesn’t need organized code.
Hopefully that EVP feels embarrassed that a big bet was made that not only didn't pay off but left the company in a worse position. Some schadenfreude may be all you can expect, since this is an executive.
lol. dude is so incompetent. changing tool for cost cutting is so stupid, we all know real cost cutting is firing people. if he is really good at he's doing, just fire 10% people and replace them with his Claude. If that didn't get backfired in 3 months, he will be CT0.
I saw a big hit to Claude’s intelligence w/ the 1M context window model and the change to adaptive reasoning (github issue linked elsewhere in this thread).
I’m pretty much using 90% Codex now, although since Claude is consistently faster at answering quick questions, I still keep it open for that and for code-reviewing codex/human work before commit.
I'm pretty sure this is an attempt by both companies to shape a reasonable finance story for their eventual IPO. They need to make this look a lot better than a pump and dump (raising on wild valuations then offloading onto public investors).
I certainly noticed a significant drop in reasoning power at some point after I subscribed to Claude. Since then I've applied all sorts of fixes that range from disabling adaptive thinking to maxing out thinking tokens to patching system prompts with an ad-hoc shell script from a gist. Even after all this, Opus will still sometimes go round and round in illogical circles, self-correcting constantly with the telltale "no wait" and undoing everything until it ends up right where it started with nothing to show for it after 100k tokens spent.
Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.
Yeah I’ve seen this too. It’s difficult for me to tell if the complaints are due to a legitimate undisclosed nerf of Claude, or whether it’s just the initial awe of Opus 4.6 fading and people increasingly noticing its mistakes.
I'm on the enterprise team plan so a decent amount of usage.
In March I could use Opus all day and it was getting great results.
Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of "But wait, actually I need to do x" with slight variations of the same realisation.
This is not the 'thinking effort' setting in claude code, I noticed this happening across multiple sessions with the same thinking effort settings, there was clearly some underlying change that was not published that made the model get stuck in thinking loops more for longer and more often without any escape hatch to stop and prompt the user for additional steering if it gets stuck.
this timing matches my experience, enterprise plan, but using opus from vscode - finished a heavy refactor of a large C# codebase mid march, tried to do basically the same thing early april and couldn't
Whenever I see Opus say “but wait, …”—which is all the time—I get a little bit closer toward throwing my computer out the window. Sometimes I just collapse the thinking section, cross my fingers, and wait for the answer. It’s too frustrating watching the thinking process.
Have you considered just… writing code? Like we used to in the good old days? If the tool drives you to that point of frustration, maybe it’s time to give the tool a break.
I stop the thinking and manually correct with explicit instructions or direction. I treat my agents like well meaning ivy-league graduate interns. They lack the experience to know what to do sometimes and need a “common sense” direction every now and then.
I’ve seen the point raised elsewhere that this could be the double usage promo that was available from the 13th of March to the 28th. ie. people getting used to the promo then feeling impacted when it finished.
Although it seems that enterprise wasn’t included, so maybe not in your case.
its sounds like, tinfoil hat, they reduced the quant size of their model and tried to mask the change with the promo. your theory only addresses the spend not the reduced realiability
I think there's a much more nefarious reason that you're missing.
It's pretty clear that OpenAI has consistently used bots on social networks to peddle their products. This could just be the next iteration, mass spreading lies about Anthropic to get people to flock back to their own products.
That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.
The trouble with that argument, though, is that it works the other way as well: how do I, a random internet citizen, know that you're not doing the same thing for Anthropic with this comment?
(FWIW I have definitely noticed a cognitive decline with Claude / Opus 4.6 over the past month and a half or so, and unless I'm secretly working for them in my sleep, I'm definitely not an Anthropic employee.)
in short, it looks like nothing has been nerfed, but sentiment has definitely been negative. I suspect some of the openclaw users have been taking out their frustrations.
Any idea what their test harness looks like? My experience comes primarily from Claude Code; this makes me wonder if recent CC updates could be more to blame than Opus 4.6 itself.
Oh it's pretty clear to me that Anthropic employs the same tactics and uses bots on socials to push its products too. On Reddit a couple of months ago it was simply unbearable with all the "Claude Opus is going to take all the jobs".
You definitely shouldn't trust me, as we're way beyond the point where you can trust ANYTHING on the internet that has a timestamp later than 2021 or so (and even then, of course people were already lying).
Personally I use Claude models through Bedrock because I work for Amazon, and I haven't noticed any decline. Instead it's always been pretty shit, and what people describe now as the model getting lost of infinite loops of talking to itself happened since the very start for me.
Judging from the number of GitHub issues on Anthropic, shamelessly being dismissed as "fixed", I doubt openai needs the bots to tarnish that competitor.
The enshittification meme has been taken too seriously to the point where it is shoehorned into every single place possible.
It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.
This shows a lack of understanding of how markets work. Investors make money when the valuation of the company increases. The valuation of the company is the best prediction of future profit risk adjusted.
How would anthropic increase future profits without satisfying customers?
Well sure, all market signals should be considered. As a casual observer, my received signals have been indicating that AI is getting sold at a loss to get market share, and more recent signals have indicated that users are really really sensitive to both costs and performance.
The weakest signal to me is investor money, because when you think of it, investors are betting on a future that may or may not be there. Heck even trends aren't guaranteed, "past performance is no guarantee etc etc"
Have you seen the business models for these companies? Literal underpants gnome memes. OpenAI's goes like this:
1. Build AGI
2. Use said AGI to tell us how to become profitable
3. Profit!
Anthropic seems to be going all in on enterprise sales. Which means they don't actually have to please customers, or it's what ThePrimeagen humorously calls a "yacht problem"—a problem that only needs a solution after the IPO. For now all they have to do is convince corporate leadership that this is the future of work and sow enough FOMO to close those sales contracts and their projected sales, and stock valuation, goes through the roof.
Of course that value will collapse if they go without delivering on their promises long enough. That's why they call it a bubble. But by then, hopefully, Dario and the early investors will be long gone and even richer than they were to start. Their only competitor, OpenAI, is confronted with the same issues: the scalability problems won't go away, and addressing them doesn't drive stock valuation the way promising high rollers that AGI and total workforce automation are just around the corner does.
It doesn't matter if it is in Anthropic's interest to screw its customer base, if their reported monthly revenue growth is accurate then it makes perfect sense why Claude would be getting dumber...
Demand is way up and compute supply is extremely limited because data center buildouts can't keep up with demand.
In the face of rising demand and insufficient compute their only practical options (other than refusing new business until demand can be met) are signicantly raising the price of tokens (and more tighly limiting subscription options) or doing behind the scenes inference optimizations that are likely to make the model dumber.
It is very easy to believe that they took the route of inference optimizations that have reduced quality of the service and that that is where the perceived enshittification is coming from.
Anthropic seems to be playing the giant-tech-rent-capture game that all of the old guards have done for the past few years. We thought that the new age of AI might bring some fresh air into the mix, but I guess that optimism quickly faded.
it has been my go-to provider for things but i noticed extraordinarily high usage rate last month on a little side project i started so that i could learn about things that are interesting to me while helping my day to day responsibilities (creating an iceberg data lake from my existing parquet files). i used my month’s worth of corporate subscription allocated tokens in 3 days. never seen that before so now i’m a lot more apprehensive about getting into the weeds with claude but i’m also so much less impressed with the other available models for work in this domain.
My working theory is that all models are approximately the same, and the variance in quality mostly depends on how long they think for.
So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.
You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.
At some point these AI companies need to pay the piper as it were and actually provide a return for their investors. Expect cost cutting attempts to continue unless backlash is great enough to pose an existential threat to these companies.
On OpenRouter token consumption is up 5x since November 2025. If this is indicative of the industries growth then I can't fathom how we will not hit resource constraints.
Its not just engineers, and its not just about the 3rd party/rate limiting stuff. I feel like the reasoning capabilities have deteriorated too for non-coding tasks.
Phase 3: $20,000/mo limited release model "too dangerous" to use
Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.
Phase 5: "Our new model can decompile and rewrite any commercial software. We just wrote a new kernel after looking at Linux (bye, bye GPL!) We also decompiled the latest Zelda game, ported the engine to Rust, and made a new game with it. Source code has no value. Even compiled and obfuscated code is a breeze to clone."
Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.
Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.
Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.
We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.
Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.
You will be backstabbed
You will be squeezed for all they can.
And you will be betrayed.
> Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
Thankfully none of them actually makes money and just runs on investment so there is a good chance bubble will drop and the price of PC equipment will... continue to rise as US gives up Taiwan to China
What I want to know is how did they make the only LLM that doesn't sound cringe?
I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.
It sounds trivial but even for Agentic, I found the writing style to be really important. When you give Claude a persona, it sounds like the thing. When you give GPT a persona, it sounds like GPT half-assedly pretending to be the thing.
---
Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:
Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).
And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.
(Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)
Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.
People keep repeating this without any real thought behind it because of the high profile resignations on the Qwen team. Meanwhile the Minimax team just released a new open weights version of their 229B model yesterday. So much for that narrative.
The AI landscape in China is larger than just Qwen and Alibaba.
Of course, but for how long? Do you think that companies will keep giving away valuable assets for free forever, or do you think that in the near future there's going to be an open weights model that's so good that people keep using it indefinitely instead of going back to frontier model providers?
The first one is just incredibly naive, the second might be true for some people, for some tasks, but it's not going to capture the majority who're chasing the latest and greatest to "keep up".
> Do you think that companies will keep giving away valuable assets for free forever
If China is forced to choose between giving the entire AI market to the US or releasing free models, they'll be releasing free models as long as it's necessary.
Why would any company release open weights once the investment money stops ?
Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.
They DO NOT want you to run AI. They want you to pay them to do it
Minimax just released a new model yesterday. You're conflating one company with a countries entire industry. There's more than just Qwen coming out of China.
z.ai did go public on the HK exchange. They are under pressures similar to other public companies.
I know that China models are increasingly being trained and run using Huawei chips instead of Nvidia. I know China has a surplus of electricity from renewables (wind, solar, hydro).
Two years ago a lot of people thought GPT-4o was usable for software development. I didn’t really find that to be the case in general but certainly it could do a lot of useful things. And now Qwen3.5-8B is just as capable and runs fine on an M2 MacBook Air.
QWEN3.5 coder next runs to ~84k context before it poops out on AMD395+ w/128GB. Most of what it's good at is boilerplate find/replace/copy/paste; but being able to scaffold things out and touch up 20-30% of the code is pretty sweet.
It all boils down to a brilliant but extremely expensive technology. Both to build and to run.
We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.
Those who care to read between the lines can see what's happening. A perfect storm of demand that attract VCs who can't understand they are the real customers. Once they understand that it will be too late.
Regarding open weight models: eventually we will, as humanity, benefit from the astronomical capital poured into developing a technology ahead of its time. In a few years this and even more will run on edge.
Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.
> We need open weights companies now more than ever.
If you're objective it to democratize AI, sure. But for those fed up with it and the devastating effects it's having on students, for example, can opt to actively avoid paying for products with AI (I say this as someone who uses it every day, guilty). At some point large companies will see that they're bleeding money for something that most people don't seem to want, and cancel those $100k/mo deals. I've already experienced one AI-developer-turned company crash and burn.
Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.
> End of the PC era, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
This one seems too far fetched. Training models is widespread. There will always be open weight models in some form, and if we assume there will be some advancements in architecture, I bet you could also run them on much leaner devices. Even today you can run models on Raspberry Pis. I don't see a reason this will stop being a thing, there will be plenty of ways to tinker.
However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.
The past two weeks I've had code that was delivered and declared as done (it did pass tests) but failed in a review by Codex. This has looped to a painful extent. The code in question deals with concurrency issues so there's an acknowledgement that its tricker, but still, I expect more from Claude.
It feels like I'm getting less and less for my money every day. A few weeks ago I was programming all week and never getting close to the limit, yesterday half my weekly limit went away in a day. Changing the limits mid-subscription is just theft.
I can't believe how quickly they went from riding high on anti-OpenAI sentiment post-DOD fiasco, to shooting themselves and all their users new and old in the foot.
The ideal time to make your product worse is probably not at the same point that all of your competitor's customers are looking. Anthropic really, really fucked up here.
And beyond that, there's a ton of people who are just regular 9-5 Claude CLI users with an enterprise subscription who are getting punished with a worse model at the same price just as if we were Claw users. This kind of thing does not make one feel warm and fuzzy. I feel like I just got a boot to the teeth.
The hypothesis that makes the most sense is not that they are idiots, but that they have no choice. They cannot meet the new demand. So they’ve quantized the model.
> The TOS basically states you need to deal with whatever they want.
FWIW that's what most TOSes say for the majority of online services. Some even include arbitration clauses to prevent civil suits and class-action cases.
So a side effect of this is -- even at 1 hour caching -- ...
If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in first place, and which in turn causes you to burn through next session quota even faster.
Seems like a vicious cycle that made the UX very poor. I remember Claude Code with Pro became virtually unuseable in middle of March with session quota expiring within first hour or less for me -- which was wildly different experience from early March.
It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/
Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take, but quoted in human effort, or suggesting the "easier" path forward even if it's a hack or kludge-filled solution.
> over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take
I heard a while back Claude refused to attempt a task for days, saying it would take weeks of work. Eventually the user convinced it to try, and it one-shotted it in 30 seconds.
Awesome, I didn't know about the car wash question.
Totally true, also tokens seem to burn through much faster. More parallelism could explain some of it but where I could work on 3-5 projects at once on the max plan a month ago, I can't even get one to completion now on the same Opus model before the 5h session locks me up..
There is a chef, he opens a restaurant. Delicious food.
It costs him more in ingredients alone than he charges. He even offers some pseudo unlimited buffet, combo sets, and happy hours.
He announced a new restaurant, apparently it will be even better, so good he's a bit worried. He makes sure to share his worries while he picks a few select enterprise for business parties and the likes.
In the meantime he cracks down on free buffet goers who happen to eat too much, and downgrades all ingredients without notice to finally hope to make a profit.
While I’ve had tremendous success with Golang projects and Typescript Web Apps, when I tried to use Metal Mesh Shaders in January, both Codex and Claude both had issues getting it right.
That sort of GPU code has a lot of concepts and machinery, it’s not just a syntax to express, and everything has to be just right or you will get a blank screen. I also use them differently than most examples; I use it for data viz (turning data into meshes) and most samples are about level of detail. So a double whammy.
But once I pointed either LLM at my own previous work — the code from months of my prior personal exploration and battles for understanding, then they both worked much better. Not great, but we could make progress.
I also needed to make more mini-harnesses / scaffolds for it to work through; in other words isolating its focus, kind of like test-driven development.
When it comes to agents like codex and CC it seems to come down to how well you can describe what you want to do, and how well you can steer it to create its own harness to troubleshoot/design properly. Once you have that down, I haven't found a lot of things you cannot do.
Breaking down and describing things in sufficient detail can be one way to ensure that the LLM can match it to its implicit knowledge. It still depends on what you’re trying to do in how much detail you have to spell out things to the LLM. It’s almost a tautology that there’s always some level of description that the LLM will be able to take up.
Well, not just breaking down the task at hand, but also how you instruct it to do any work. Just saying "Do X" will give you very different results from "Do X, ensure Y, then verify with Z", regardless of what tasks you're asking it to do.
That's also how you can get the LLM to do stuff outside of the training data in a reasonably good way, by not just including the _what_ in the prompt, but also the _how_.
Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.
I have also switched from claude to codex a few weeks ago. After deciding to let agents only do focused work I needed less context, and the work was easier to review. Then I realized codex can deliver the same quality, and it's paid through my subscription instead of per token.
I made this switch months ago, ChatGPT 5.4 being a smarter model, but I’ve had subjective feelings of degradation even on 5.4 lately. There’s a lot of growth in usage right now so not sure what kind of optimizations their doing at both companies
I would switch to Codex, but Altman is such a naked sociopath and OpenAI so devoid of ethical business practices that I can't in good conscience. I'm not under any illusion that Anthropic is ethical, but it is so far a step up from OpenAI.
I'm with you on the ethical part, but everything is a spectrum. All the AI leadership are some shade of evil. There's no way the product would be effective if they weren't. I don't like that Sam Altman is a lunatic, but frankly they all are. I also recognize that these are massive companies filled with non shitty engineers who are actually responsible for a lot of the magic. Conflating one charlatan with the rest of it is a tragedy of nuance.
Yeah, but there's distinct difference between "risks their company because they refuse to help with killing little kids" and "happily helping with genocide".
This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?
It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.
You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.
Just give us the option to get the quality back, Anthropic. I get that even a $200 subscription is not possible eventually, but give us the option to sub the $1000 tier or tell us to use the API tier, but give us some consistency.
This. I get much more value than 90€ from my Claude Code subscription. I am willing to pay more for consistency and not having to watch my back all the time, because I might get screwed over.
From the recent-ish Dwarkesh podcast, Anthropic seems to be wary about buying/building too much compute [0]. That probably means that they have to attempt to minimize compute usage when there is a surge in demand. Following the argument in the podcast, throwing more money after them, as some in this thread are suggesting, won’t solve the issue, at least not in the short term.
So, this especially bites if your validation step (let’s say integration tests) take 1hr plus. The harness is just waiting, prefix caching should happily resume things with just a minor new prefill chunk of output from the harness, and bam - completely new prefill.
I think they changed the quantification to save computer power for their new model. This might be why the benchmark scores look good, but the real world performance is much worse. I'm wondering if they're testing the model internally and didn't find anything wrong with the new parameter.
I canceled my subscription and switched to a codex, but it's not as good. I'm tired of Anthropic changing things all the time. I use Claude because it doesn't redirect you to a different model like OpenAI does. But now it seems like both companies are doing the same thing in different way.
Lately I am finding myself doing more and more of what I called "ambient coding" so that I am not directly using anymore all of those coding harnesses.
If youre reading this claude, people are willing to pay extra if you want to make more money, just please stop doing this undermining, it devreases the trust of your platform to something that cannot be relied on
As a Pro user, even though these issues and bugs are “new,” the downgrade has been noticeable since January. I’ve unsubscribed because the Pro plan is no longer usable for me.
It’s only making the news now because it’s affecting Max users as well ($100/$200 plans). I understand the need for change, but having zero communication about it is just wrong.
I also noticed this, just resuming something eats up your entire session. The past two weeks also felt like a substantial downgrade and made me regret renewing my subscription, it sucks because I wish I kept my Codex subscription instead and renewed that.
It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.
Since I (until Anthropic decided to remove access for subs) used Anthropic models extensively with pi I explored the two caching options and the much higher cost of 1h caches is almost never a good tradeoff.
Since the caching really primarily is something they can be judged at scale from across many users I can only assume that Anthropic looked at their infra load and impact and made a very intentional change.
Am I the only one who sees striking parallels between being a Claude Code customer and Cuckoldry (as in biology)?
I mean, you are investing a lot (infrastructure and capital) into something that is essentially not yours. You claim credit for the offspring (the solution) simply because it resides in your workspace. You accept foreign code to make your project appear more successful and populated than you could manage alone. Your over-reliance on a surrogate for the heavy lifting leads to the loss of your own survival skills (coding and debugging). Last but not least, you handle the grunt work of territory defense (clients and environments) while the AI performs the actual act of creation (Displaced Agency).
I noticed another limitation:
"An image in the conversation exceeds the dimension limit for many-image requests (2000px). Start a new session with fewer images."
So I can't continue my claude code session I started yesterday.
This is the same shit openAI used to do last year, quietly downgrading their offerings while hyping the next big thing. I thought Anthropic were different but it seems they're playing the exact same long con with Mythos.
They can't really revolutionize AI again so they make the product worse and worse and then offer you a "better" one
Caching LLM is not like caching normal content; the longer it is the more beneficial it is and it only stops being worth when user stops current session.
So you'd need some adaptive algorithm to decide when to keep caching and when to purge it whole, possibly on client side, but if you give client the control, people will make it use most cache possible just to chase diminishing returns. So fine grained control here isn't all that easy; other possible option is just to have cache size per account and then intelligently purge it instead of relying just on TTL
Has anybody else noticed a pretty significant shift in sentiment when discussing Claude/Codex with other engineers since even just a few months ago? Specifically because of the secret/hidden nature of these changes.
I keep getting the sense that people feel like they have no idea if they are getting the product that they originally paid for, or something much weaker, and this sentiment seems to be constantly spreading. Like when I hear Anthropic mentioned in the past few weeks, it's almost always in some negative context.
I think so, but more than that, the performance of those tools seems to be terribly degrading when they keep saying they have created some crap like AGI which we know is a lie.
And to me, this lie is mostly a fight to see who bites the biggest chunk of the war death machine.
Well, off the top of my head:
- Banning OpenClaw users (within their rights, of course, but bad optics)
- Banning 3rd party harnesses in general (ditto)
(claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
- Lowering reasoning effort (and then showing up here saying "we'll try to make sure the most valuable customers get the non-gimped experience" (paraphrasing slightly xD))
- Massively reduced usage (apparently a bug?) The other day I got 21x more usage spend on the same task for Claude vs Codex.
- Noticed a very sharp drop in response length in the Claude app. Asked Claude about it and it mentioned several things in the system prompt related to reduced reasoning effort, keeping responses as brief as possible, etc.
It's all circumstantial but everything points towards "desperately trying to cut costs".
I love Claude and I won't be switching any time soon (though with the usage limits I'm increasingly using Codex for coding), but it's getting hard to recommend it to friends lately. I told a friend "it was the best option, until about two weeks ago..." Now it's up in the air.
> It's all circumstantial but everything points towards "desperately trying to cut costs".
I have been wondering if it's more geared at reducing resource usage, given that at the moment there's a known constraint on AI datacenter expansion capability. Perhaps they are struggling to meet demand?
It’s more that Anthropic knows that the models themselves are non-sticky, and the real moat is in the ecosystem around it.
It only makes sense for them to get users to use their ecosystem, rather than other tools.
> Perhaps Anthropic is struggling to meet demand?
Yes, definitely, they’re gracefully failing to meet demand. They could also deny new customers, but it would probably be bad for business.
I wish they would just rip the bandaid to stop everybody's entitled whining.
"We're sorry, what we were able to give you for $100/mo before now needs to be $200/mo (or more). We miscalculated/we were too generous/gave too much away for too little. It's a new technology, we are seeing a ton of demand, we are trying to run a business, hope you understand. If you don't want it, don't pay for it."
> (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
How often? Realistically, if you invoke it occasionally, for what's clearly an amount that's "reasonable personal use", then no you don't get nuked.
It’s the same problem people have with Google. If they ban you for some AI hallucinated reason you have no recourse other than going viral on Hacker News.
They also screwed up the API token detection and also blocked a bunch of 1st party tool users for ~24h.
Support consisted of AI bots saying you did something stupid, you did something wrong, you were abusing the system, followed by (only when I asked for it explicitly) claiming to file a ticket with a human who will contact you later (and it either didn't happen or their ticket system is /dev/null).
(By the way this is the 2nd time I've been "please hold" gaslit by support LLMs this exact same way, the other being with Square)
claude -p not working would be instant unsubscribe downgrade from Max to Pro and further drive my use of codex. I use both but overall have noticed I reach for Claude less than codex lately because claude keeps getting slower and slower (I have not noticed a drop off in quality, but I use it less and less so maybe I'm not in a good position to notice).
Generally I find codex and claude make a good team. I'm not a heavy user, but I am currently Claude Max 5x and ChatGPT Plus. Now that OpenAI has a $100 offering and I am finding myself using Claude less, I am considering switching to Claude Pro and ChatGPT Pro x5. The work hours restriction on Claude Max x5 really pisses me off.
I am not a heavy user. Historically I only break over 50% weekly one week a month and average about 30-40% of Max x5 over the entire month. I went Max because of the weekly limits and to access the better models and because I felt I was getting value. I need an occasional burst of usage, not 24/7 slow compute. But even for pay-as-you-go burst usage Anthropic's API prices are insane vs Max.
I have yet to ever hit a limit on codex so it's not on my mind. And lately it seems like Claude is likely to be having a service interruption anyway. A big part of subscribing to Claude Max was to get away from how the usage limits on Pro were causing me to architect my life around 5hr windows. And now Anthropic has brought that all back with this don't use it before 2pm bullshit. I want things ready to go when the muses strike. I'm honestly questioning whether Anthropic wants anyone who isn't employed as a software engineer to use their kit.
Anyway for the last month or so codex "just works" and Claude has been an invitation for annoyances. There was a time when codex was quite a bit behind claude-code. They have been roughly equal (different strength and weaknesses) since at least February (for me).
> (claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked. Would be great to get some clarity on this. If I invoke it from my Telegram bot, is that an unauthorized 3rd party harness?)
100% this, I’ve posted the same sentiment here on HN. I hate the chilling effect of the bans and the lack of clarity on what is and is not allowed.
In this case, they handled things pretty well. You can still use openclaw etc with your regular Anthropic subscription, it will just count towards your extra credits / usage which you can buy for a 30% discount compared to API pricing. And they gave everyone one month’s value in credits.
I don’t think they could have done that much better I’d say.
That does not address joshstrange's concerns.
There is very poor clarity about what is and isn't allowed with the Claude SDK/claude -p. Are we allowed to use it to automate stuff? What kind of tasks is it permitted to be used for? What if you call your script 'OrangeClaw' and release that on GitHub? What if your script gets super popular, does it suddenly become against TOS?
This is exactly my point. At what point does it become a ToS violation? Right now it's a huge grey area and the idea of getting my account banned because I crossed an invisible line with zero recourse other than to switch providers is... frustrating.
It's pretty easy to read between the lines tbh. Personal, non-automated use is fine. Using it as a means to automate depleting your 5-hour limit 24/7 ("leftover usage") is not fine. They don't want to put in in the ToS because it's almost impossible because writing what I just said will still have people going "well what's automated, where's the exact line!" when it's all pretty clear what the intended use case here is. The Anthropic peeps have said about as much.
I get that the traditional dev is allergic to the concept of reading between the lines and demands everything to be spelled out explicitly, but maybe you should just see it as something to learn because it's an incredibly useful life skill.
That "non-automated" part is where I feel like there is a lack of clarity. They even have some stuff in to allow for scheduling in Claude Code. Seems similar to a cron but "non-automated" would rule out using a cron (right?). I'd love to feel comfortable setting up daily/hourly tasks for Claude Code but that feels iffy. Like I said, I don't think the line is clear.
The lack of clarity doesn't matter because they obviously can't tell if you ran a claude -p a few times today with usual prompts or whether your cron job did. It's impossible for them to reliably tell.
It can tell if your cron is running them every 10 minutes 24/7, because basic biology rules out you doing that for more than a day or so.
Ok, let's say I'm not using it to deplete leftover usage, the task just happens to run down the 5 hour window usage.
Are you willing to bet your account over whether you've read between the lines correctly? Anthropic aren't going to listen to appeals.
> the task just happens to run down the 5 hour window usage.
In a single prompt? From zero usage? That doesn't "just happen".
When you're using the SDK, yes it can. Example: I used the Python SDK to translate a bunch of source code recently. I spawned a subagent for each module that needed translating and left it to run for a few hours with a parallelism limit of 5. It blasted through the 5 hour usage and dug into extra usage credits.
I have zero assurances that the above can't result in a ban. The usage pattern is not distinct from OpenClaw.
Wait, this is news to me. I thought 3rd party use of the sub was unequivocally prohibited?
If I'm understanding you correctly: they changed that policy, you can now use 3rd party software unofficially with the undocumented Claude Code endpoint, and their servers auto-detect this and charge you extra for it?
EDIT: Yeah, something like that?
> Starting April 4 at 12pm PT / 8pm BST, you’ll no longer be able to use your Claude subscription limits for third-party harnesses including OpenClaw. Instead, they’ll require extra usage.
https://news.ycombinator.com/item?id=47633568
This seems to mean that unauthorized usage of the sub endpoint is tolerated now (and billed as though it were the regular API). And possibly affects claude -p, though I don't know yet.
Why were third party harnesses banned? Surely they'd want sticking power over the ecosystem.
There’s the argument that Anthropic has built Claude Code to use the models efficiently, which the subscription pricing is based on.
Maybe there’s some truth to that, but then why haven’t OpenAI made the same move? I believe the main reason is platform control. Anthropic can’t survive as a pipeline for tokens, they need to build and control a platform, which means aggressively locking out everybody else building a platform.
Alternatively products like openclaw have an outsized impact on Anthropic's infrastructure for essentially no benefit to them. Especially when you're taking advantage of the $200 plan.
OpenAI has never shyed away from burning mountains of cash to try and capture a little more market share. They paid a billion dollars for a vibe coded mess just for the opportunity to associate themselves with the hype.
> Taking advantage of the $200 plan.
No, I'm paying $200 a month for a premium product that I expect premium service for. It's the single most expensive IT expense I have. Taking advantage my foot.
One thing is lack of control of token efficiency on what’s already a subsidised product.
Another thing is branding: Their CLI might be the best right now, but tech debt says it won’t continue to be for very long.
By enforcing the CLI you enforce the brand value — you’re not just buying the engine.
Claude Code was the best harness from roughly around release to January this year. Ever since then, it's become more and more bloated with more and more stuff and seemingly no coherent plan or vision to it all other than "let's see what else that sounds cool we can cram in there."
What's taken over since then? Codex or something else?
Pi.dev
Maybe they should fix bugs like this then https://github.com/anthropics/claude-code/issues/17979#issue... ...
Note that the thing that's banned is using third party harnesses with their subscription based pricing.
If you're paying normal API prices they'll happily let you use whatever harness you want.
To be clear they weren’t banned from Claude usage, they were required to use the API and API rates rather than Claude Max tokens.
Claude code uses a bunch if best practices to maximize cache hit rate. Third party harnesses are hit or miss, so often use a lot more tokens for the same task.
nah this doesn't explain it.
most of the users of those third party harnesses care just as much about hitting cache and getting more usage.
I'm watching a conference talk right now from 2 weeks ago: "I Hated Every Coding Agent So I Built My Own - Mario Zechner (Pi)", and in the middle he directly references this.
He demonstrates in the code that OpenCode aggressively trims context, by compacting on every turn, and pruning all tool calls from the context that occurred more than 40,000 tokens ago. Seems like it could be a good strategy to squeeze more out of the context window - but by editing the oldest context, it breaks the prompt cache for the entire conversation. There is effectively no caching happening at all.
https://youtu.be/Dli5slNaJu0
but claude -p is still Claude Code
Was something using that been banned?
Yep, that's the reason for the new Extra Credit feature in Claude Code. Some people were wiring up "Claude -p" with OpenClaw, so now Anthropic detects if the system prompt contains the phrase OpenClaw, and bills from Extra Credit if that happens:
https://x.com/steipete/status/2040811558427648357
"Anthropic now blocks first-party harness use too
claude -p --append-system-prompt 'A personal assistant running inside OpenClaw.' 'is clawd here?'
→ 400 Third-party apps now draw from your extra usage, not your plan limits.
So yeah: bring your own coin "
https://xcancel.com/bcherny/status/2041035127430754686#m
> This is not intentional, likely an overactive abuse classifier. Looking, and working on clarifying the policy going forward.
> claude -p still works on the sub but I get the feeling like if I actually use it, I'll get my Anthropic acct. nuked
I've used it with a sub a lot. Concurrency of 40 writing descriptions of thousands of images, running for hours on sonnet.
I have a lot of complaints. I've cancelled my $200 subscription and when it runs out in a few days I'll have to find something else.
But claude -p is fine.
... Or it was 2 week ago. Who knows if they've silently throttled it by now?
The other day I read that letting another agent invoke claude -p was considered a violation (i.e. letting OpenClaw delegate to Claude Code).
Not sure how that's enforced though. I was in OpenClaw discord a while ago and enforcement seemed a bit random.
I'll try to find the source, I might have gotten the details mixed up.
It’s not a “violation” but they said it would be charged as extra usage.
This is a funny cat and mouse game. They offer a built in loop command.
Just tmux and use that.
Soon if they drop -p people will just vibe code in 5 minutes a way to type inside it remotely similar to their own built in remote access tool. Seems like a losing game from anthropics side
>> apparently a bug?
it's a bug only if they get a harsh public response, otherwise it becomes a feature
A month ago the company I work at with over 400 engineers decided to cancel all IDE subscriptions (Visual Studio, JetBrains, Windsurf, etc.) and move everyone over to Claude Code as a "cost-saving measure" (along with firing a bunch of test engineers). There was no migration plan - the EVP of Technology just gave a demo showing 2 greenfield projects he'd built with Claude Opus over a weekend and told everyone to copy how he worked. A week later the EVP had to send out an email telling people to stop using Opus because they were burning through too many tokens.
Claude seems to be getting nerfed every week since we've switched. I wonder how our EVP is feeling now.
Wow, that sounds like you have a astoundingly terrible EVP.
Pretty bad decision on his part. I've been telling other engineers within my company who felt threatened by AI that this would happen. That prices would rise and the marginal cost for changes to big codebases would start to exceed the cost of an engineer's salary. API credits are expensive, especially for huge contexts, and sometimes the model will use $200 in credits trying to solve a problem that could be fixed in an hour by a good engineer with enough context.
It kind of reminds me of the joke where a plumber charges $500 for a 5 minute visit. When the client complains the plumber says it's $50 for labor and $450 for knowing how to fix the problem.
> the model will use $200 in credits trying to solve a problem that could be fixed in an hour by a good engineer with enough context
So the price for fixing the problem is equal. Sounds like a great argument for AI.
99% of software developers earn less than 200 USD a hour
Most good engineers are way cheaper than that. The world is bigger than the united states.
Equal sounds like a terrible argument given all the other problems with replacing engineering thought with ai. I don't know where the line is but I expect it's far beyond equal AND there needs to be a level of "this can debug effectively in production" before that makes any sense for a real business case.
A good lesson for all - I always really liked the Picasso version:
In a bustling restaurant, an excited patron recognized the famous artist Picasso dining alone. Seizing the moment, the patron approached Picasso with a simple request. With a plain napkin and a big smile, he asked the artist for a drawing. He promised payment for his troubles. Picasso, ever the creator, didn’t hesitate. From his pocket, he produced a charcoal pencil and he brought to life a stunning sketch of a goat on the napkin—a clear mark of his unique style. Proudly, he presented it to the patron.
The artwork mesmerized the patron, who reached out to take it, only to be stopped by Picasso’s firm hand. “That will be $100,000,” Picasso declared.
Astonished, the patron balked at the sum. “But it took you just a few seconds to draw this!”
With a calm demeanor, Picasso took back the napkin, crumpled it, and tucked it away into his pocket, replying, “No, it has taken me a lifetime.”
Good story but not applicable at all
I can’t believe how many small to mid size companies are being destroyed by bad decisions like this.
A friend’s company fired all EMs and have engineers reporting to product managers. They aren’t allowed to do refactors because the CTO believes the AI doesn’t need organized code.
He must be feeling pretty good, after all he still believes that it was the right call, and he definitely won't be admitting a mistake.
There's 0 chance of him facing the consequences for it either.
Hopefully that EVP feels embarrassed that a big bet was made that not only didn't pay off but left the company in a worse position. Some schadenfreude may be all you can expect, since this is an executive.
But cancelling IDE subscriptions? You need a proper IDE to along side AI augmented development unless you want to simply be along for the ride.
Free VS Code is probably fine
I'm using the JetBrains IDE's and it's definitely worth paying for, even in the age of AI.
lol. dude is so incompetent. changing tool for cost cutting is so stupid, we all know real cost cutting is firing people. if he is really good at he's doing, just fire 10% people and replace them with his Claude. If that didn't get backfired in 3 months, he will be CT0.
I saw a big hit to Claude’s intelligence w/ the 1M context window model and the change to adaptive reasoning (github issue linked elsewhere in this thread).
I’m pretty much using 90% Codex now, although since Claude is consistently faster at answering quick questions, I still keep it open for that and for code-reviewing codex/human work before commit.
I'm pretty sure this is an attempt by both companies to shape a reasonable finance story for their eventual IPO. They need to make this look a lot better than a pump and dump (raising on wild valuations then offloading onto public investors).
I certainly noticed a significant drop in reasoning power at some point after I subscribed to Claude. Since then I've applied all sorts of fixes that range from disabling adaptive thinking to maxing out thinking tokens to patching system prompts with an ad-hoc shell script from a gist. Even after all this, Opus will still sometimes go round and round in illogical circles, self-correcting constantly with the telltale "no wait" and undoing everything until it ends up right where it started with nothing to show for it after 100k tokens spent.
Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.
There's a github issue for this: https://github.com/anthropics/claude-code/issues/42796
Yes, I commented on it and applied all remedies suggested.
https://news.ycombinator.com/item?id=47664442
Configuration and environment variables seem to have improved things somewhat but it still seems to be hit or miss.
Yeah I’ve seen this too. It’s difficult for me to tell if the complaints are due to a legitimate undisclosed nerf of Claude, or whether it’s just the initial awe of Opus 4.6 fading and people increasingly noticing its mistakes.
It's not just you, there is a github issue for it: https://github.com/anthropics/claude-code/issues/42796
Both can be a thing at same time
Just one more anecdote:
I'm on the enterprise team plan so a decent amount of usage.
In March I could use Opus all day and it was getting great results.
Since the last week of March and into April, I've had sessions where I maxed out session usage under 2 hours and it got stuck in overthinking loops, multiple turns of realising the same thing, dozens of paragraphs of "But wait, actually I need to do x" with slight variations of the same realisation.
This is not the 'thinking effort' setting in claude code, I noticed this happening across multiple sessions with the same thinking effort settings, there was clearly some underlying change that was not published that made the model get stuck in thinking loops more for longer and more often without any escape hatch to stop and prompt the user for additional steering if it gets stuck.
this timing matches my experience, enterprise plan, but using opus from vscode - finished a heavy refactor of a large C# codebase mid march, tried to do basically the same thing early april and couldn't
Whenever I see Opus say “but wait, …”—which is all the time—I get a little bit closer toward throwing my computer out the window. Sometimes I just collapse the thinking section, cross my fingers, and wait for the answer. It’s too frustrating watching the thinking process.
Have you considered just… writing code? Like we used to in the good old days? If the tool drives you to that point of frustration, maybe it’s time to give the tool a break.
I stop the thinking and manually correct with explicit instructions or direction. I treat my agents like well meaning ivy-league graduate interns. They lack the experience to know what to do sometimes and need a “common sense” direction every now and then.
I’ve seen the point raised elsewhere that this could be the double usage promo that was available from the 13th of March to the 28th. ie. people getting used to the promo then feeling impacted when it finished.
Although it seems that enterprise wasn’t included, so maybe not in your case.
https://support.claude.com/en/articles/14063676-claude-march...
its sounds like, tinfoil hat, they reduced the quant size of their model and tried to mask the change with the promo. your theory only addresses the spend not the reduced realiability
It's probably because you didn't specify "make no mistakes" /s
In all seriousness though, I've observed the same thing with my own usage.
I think there's a much more nefarious reason that you're missing.
It's pretty clear that OpenAI has consistently used bots on social networks to peddle their products. This could just be the next iteration, mass spreading lies about Anthropic to get people to flock back to their own products.
That would explain why a lot of users in the comments of those posts are claiming that they don't see any changes to limits.
The trouble with that argument, though, is that it works the other way as well: how do I, a random internet citizen, know that you're not doing the same thing for Anthropic with this comment?
(FWIW I have definitely noticed a cognitive decline with Claude / Opus 4.6 over the past month and a half or so, and unless I'm secretly working for them in my sleep, I'm definitely not an Anthropic employee.)
https://isitnerfed.org/
in short, it looks like nothing has been nerfed, but sentiment has definitely been negative. I suspect some of the openclaw users have been taking out their frustrations.
That's fascinating.
Any idea what their test harness looks like? My experience comes primarily from Claude Code; this makes me wonder if recent CC updates could be more to blame than Opus 4.6 itself.
Oh it's pretty clear to me that Anthropic employs the same tactics and uses bots on socials to push its products too. On Reddit a couple of months ago it was simply unbearable with all the "Claude Opus is going to take all the jobs".
You definitely shouldn't trust me, as we're way beyond the point where you can trust ANYTHING on the internet that has a timestamp later than 2021 or so (and even then, of course people were already lying).
Personally I use Claude models through Bedrock because I work for Amazon, and I haven't noticed any decline. Instead it's always been pretty shit, and what people describe now as the model getting lost of infinite loops of talking to itself happened since the very start for me.
Judging from the number of GitHub issues on Anthropic, shamelessly being dismissed as "fixed", I doubt openai needs the bots to tarnish that competitor.
There's still plenty of "leave my fellow multbillion corp alone" type ones,it means that corp can and should screw it's loving customer base harder.
The enshittification meme has been taken too seriously to the point where it is shoehorned into every single place possible.
It is not in the interests for Anthropic to screw its customer base. Running a frontier lab comes with tradeoffs between training, inference and other areas.
The investors are their customers - not the users of the end-product.
This shows a lack of understanding of how markets work. Investors make money when the valuation of the company increases. The valuation of the company is the best prediction of future profit risk adjusted.
How would anthropic increase future profits without satisfying customers?
Early investors make money when later investors buy them out at inflated valuations.
Well sure, all market signals should be considered. As a casual observer, my received signals have been indicating that AI is getting sold at a loss to get market share, and more recent signals have indicated that users are really really sensitive to both costs and performance.
The weakest signal to me is investor money, because when you think of it, investors are betting on a future that may or may not be there. Heck even trends aren't guaranteed, "past performance is no guarantee etc etc"
Have you seen the business models for these companies? Literal underpants gnome memes. OpenAI's goes like this:
1. Build AGI
2. Use said AGI to tell us how to become profitable
3. Profit!
Anthropic seems to be going all in on enterprise sales. Which means they don't actually have to please customers, or it's what ThePrimeagen humorously calls a "yacht problem"—a problem that only needs a solution after the IPO. For now all they have to do is convince corporate leadership that this is the future of work and sow enough FOMO to close those sales contracts and their projected sales, and stock valuation, goes through the roof.
Of course that value will collapse if they go without delivering on their promises long enough. That's why they call it a bubble. But by then, hopefully, Dario and the early investors will be long gone and even richer than they were to start. Their only competitor, OpenAI, is confronted with the same issues: the scalability problems won't go away, and addressing them doesn't drive stock valuation the way promising high rollers that AGI and total workforce automation are just around the corner does.
It doesn't matter if it is in Anthropic's interest to screw its customer base, if their reported monthly revenue growth is accurate then it makes perfect sense why Claude would be getting dumber...
Demand is way up and compute supply is extremely limited because data center buildouts can't keep up with demand.
In the face of rising demand and insufficient compute their only practical options (other than refusing new business until demand can be met) are signicantly raising the price of tokens (and more tighly limiting subscription options) or doing behind the scenes inference optimizations that are likely to make the model dumber.
It is very easy to believe that they took the route of inference optimizations that have reduced quality of the service and that that is where the perceived enshittification is coming from.
Anthropic seems to be playing the giant-tech-rent-capture game that all of the old guards have done for the past few years. We thought that the new age of AI might bring some fresh air into the mix, but I guess that optimism quickly faded.
it has been my go-to provider for things but i noticed extraordinarily high usage rate last month on a little side project i started so that i could learn about things that are interesting to me while helping my day to day responsibilities (creating an iceberg data lake from my existing parquet files). i used my month’s worth of corporate subscription allocated tokens in 3 days. never seen that before so now i’m a lot more apprehensive about getting into the weeds with claude but i’m also so much less impressed with the other available models for work in this domain.
My working theory is that all models are approximately the same, and the variance in quality mostly depends on how long they think for.
So the trick is to always set to max, and then begin every task with “this is an extremely complex task, do not complete it without extensive deep thinking and research” or whatever.
You’re basically fighting a battle to make the model think more, against the defaults getting more and more nerfed to save costs.
At some point these AI companies need to pay the piper as it were and actually provide a return for their investors. Expect cost cutting attempts to continue unless backlash is great enough to pose an existential threat to these companies.
On OpenRouter token consumption is up 5x since November 2025. If this is indicative of the industries growth then I can't fathom how we will not hit resource constraints.
Its not just engineers, and its not just about the 3rd party/rate limiting stuff. I feel like the reasoning capabilities have deteriorated too for non-coding tasks.
Anthropic isn't your friend.
Phase 1: $200/mo prosumer engineer tool
Phase 2: AI layoffs / "it's just AI washing"
Phase 3: $20,000/mo limited release model "too dangerous" to use
Phase 4: Accelerated layoffs / two person teams. Rehiring of certain personnel at lower costs.
Phase 5: "Our new model can decompile and rewrite any commercial software. We just wrote a new kernel after looking at Linux (bye, bye GPL!) We also decompiled the latest Zelda game, ported the engine to Rust, and made a new game with it. Source code has no value. Even compiled and obfuscated code is a breeze to clone."
Phase 6: $100k/mo model that replicates entire engineering teams, only large companies can afford it. Ordinary users can't buy. More layoffs.
Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
Anothropic used to be cool before they started gating access. Limiting Claw/OpenCode was strike one. Mythos is strike two.
Y'all should have started hating on their ethics when they started complaining about being distilled. For training they conducted on materials they did not own.
We need open weights companies now more than ever. Too bad China seems to be giving up on the idea.
"You wouldn't distill an Opus."
Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.
You will be backstabbed
You will be squeezed for all they can.
And you will be betrayed.
> Phase N: People can't afford computing anymore. Everything is thin clients and rented. It's become like the private railroad industry. End of the PC era. Like kids growing up on smartphones, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
Thankfully none of them actually makes money and just runs on investment so there is a good chance bubble will drop and the price of PC equipment will... continue to rise as US gives up Taiwan to China
> Stop thinking billion dollar publicly traded companies are "cool" just because they make widget you like.
Anthropic is a private company but nevertheless, the sentiment is accurate and applies to all kinds of corporations.
What I want to know is how did they make the only LLM that doesn't sound cringe?
I think it has something to do with mode collapse (although Claude certainly has its own "tells"), but I'm not sure.
It sounds trivial but even for Agentic, I found the writing style to be really important. When you give Claude a persona, it sounds like the thing. When you give GPT a persona, it sounds like GPT half-assedly pretending to be the thing.
---
Some other interesting points about Anthropic's models. I don't know if any of these relate to my LLM style question, but seems worth mentioning:
Claude models also use way less tokens for the same task (on ArtificialAnalysis, they are a clear outlier on this metric).
And there's a much stronger common sense, subjectively. (Not sure if we have a good way to actually measure that, though.) It takes context and common sense into account, to a much greater degree.
(Which ties in with their constitution. Understanding why things are wrong at a deeper level, rather than just surface level pattern matching.)
Opus is great but it should be bigger. You notice the difference between Sonnet and Opus, but with heavy use you notice Opus's limitations, too.
What leads you to say China AI is giving up on open weights?
I've been using GLM for over 6 months and pretty happy.
People keep repeating this without any real thought behind it because of the high profile resignations on the Qwen team. Meanwhile the Minimax team just released a new open weights version of their 229B model yesterday. So much for that narrative.
The AI landscape in China is larger than just Qwen and Alibaba.
> Meanwhile the Minimax team just released a new open weights version of their 229B model yesterday.
its under new license prohibiting any commercial use.
Of course, but for how long? Do you think that companies will keep giving away valuable assets for free forever, or do you think that in the near future there's going to be an open weights model that's so good that people keep using it indefinitely instead of going back to frontier model providers?
The first one is just incredibly naive, the second might be true for some people, for some tasks, but it's not going to capture the majority who're chasing the latest and greatest to "keep up".
> Do you think that companies will keep giving away valuable assets for free forever
If China is forced to choose between giving the entire AI market to the US or releasing free models, they'll be releasing free models as long as it's necessary.
the asset's value is in being released, so yes
What does that mean?
Why would any company release open weights once the investment money stops ?
Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.
They DO NOT want you to run AI. They want you to pay them to do it
Minimax just released a new model yesterday. You're conflating one company with a countries entire industry. There's more than just Qwen coming out of China.
ok. maybe. I don't know. I'm asking how you know.
z.ai did go public on the HK exchange. They are under pressures similar to other public companies.
I know that China models are increasingly being trained and run using Huawei chips instead of Nvidia. I know China has a surplus of electricity from renewables (wind, solar, hydro).
open weights is a way to nerf your opponent and is meaningless to your business if you need to retrain a model because your trailing
So, it makes a lot of sense to get people a "demo" and claim the paid product is better.
i think a lot of people have no idea how capable local models are atm.
Two years ago a lot of people thought GPT-4o was usable for software development. I didn’t really find that to be the case in general but certainly it could do a lot of useful things. And now Qwen3.5-8B is just as capable and runs fine on an M2 MacBook Air.
QWEN3.5 coder next runs to ~84k context before it poops out on AMD395+ w/128GB. Most of what it's good at is boilerplate find/replace/copy/paste; but being able to scaffold things out and touch up 20-30% of the code is pretty sweet.
Good read on the situation.
It all boils down to a brilliant but extremely expensive technology. Both to build and to run.
We've been sold a product with heavy subsidy. The idea (from Sam) scale out and see what happens.
Those who care to read between the lines can see what's happening. A perfect storm of demand that attract VCs who can't understand they are the real customers. Once they understand that it will be too late.
Regarding open weight models: eventually we will, as humanity, benefit from the astronomical capital poured into developing a technology ahead of its time. In a few years this and even more will run on edge.
Written by open source developers, likely former openai and anthropic employees who got so much cash in the bank they don't need to worry about renting their knowledge.
> We need open weights companies now more than ever.
If you're objective it to democratize AI, sure. But for those fed up with it and the devastating effects it's having on students, for example, can opt to actively avoid paying for products with AI (I say this as someone who uses it every day, guilty). At some point large companies will see that they're bleeding money for something that most people don't seem to want, and cancel those $100k/mo deals. I've already experienced one AI-developer-turned company crash and burn.
Personally, I don't think this LLM-based AI generation will have any significant positive impacts. Time, energy (CO2) and money would have been far better spent elsewhere.
> End of the PC era, there's nothing to tinker with anymore. And certainly no gradient for entrepreneurship for once-skilled labor capital.
This one seems too far fetched. Training models is widespread. There will always be open weight models in some form, and if we assume there will be some advancements in architecture, I bet you could also run them on much leaner devices. Even today you can run models on Raspberry Pis. I don't see a reason this will stop being a thing, there will be plenty of ways to tinker.
However, keep in mind the masses don't care about tinkering and never have. People want a ChatGPT experience, not a pytorch experience. In essence this is true for all tech products, not just AI.
Developers are a tough crowd, stubborn, know it alls.
This is actually great feature, you can do bait and switch with AI.
The past two weeks I've had code that was delivered and declared as done (it did pass tests) but failed in a review by Codex. This has looped to a painful extent. The code in question deals with concurrency issues so there's an acknowledgement that its tricker, but still, I expect more from Claude.
It feels like I'm getting less and less for my money every day. A few weeks ago I was programming all week and never getting close to the limit, yesterday half my weekly limit went away in a day. Changing the limits mid-subscription is just theft.
I can't believe how quickly they went from riding high on anti-OpenAI sentiment post-DOD fiasco, to shooting themselves and all their users new and old in the foot.
The ideal time to make your product worse is probably not at the same point that all of your competitor's customers are looking. Anthropic really, really fucked up here.
And beyond that, there's a ton of people who are just regular 9-5 Claude CLI users with an enterprise subscription who are getting punished with a worse model at the same price just as if we were Claw users. This kind of thing does not make one feel warm and fuzzy. I feel like I just got a boot to the teeth.
The hypothesis that makes the most sense is not that they are idiots, but that they have no choice. They cannot meet the new demand. So they’ve quantized the model.
https://isitnerfed.org/
The TOS basically states you need to deal with whatever they want.
Meanwhile their 'best' competitor just announced they want to provide unreliable mass destruction guidance tools but they don't wanna feel said.
Honestly speaking, we are wrong whenever we do business with this sort of people
> The TOS basically states you need to deal with whatever they want.
FWIW that's what most TOSes say for the majority of online services. Some even include arbitration clauses to prevent civil suits and class-action cases.
The title should be changed. It makes it look like they upped the TTL from 1 h to 5 months.
The SI symbol for minutes is "min", not "M".
A compromise would be to use the OP notation "m".
I agree. My first reaction was "what the fuck's an 'M'?"
Five million. No matter the unit, just, 5.000.000
So a side effect of this is -- even at 1 hour caching -- ...
If you run out of session quota too quickly and need to wait more than an hour to resume your work ... you are paying even more penalty just to resume your work -- a penalty you wouldnt have needed if session quota was not so restrictive in first place, and which in turn causes you to burn through next session quota even faster.
Seems like a vicious cycle that made the UX very poor. I remember Claude Code with Pro became virtually unuseable in middle of March with session quota expiring within first hour or less for me -- which was wildly different experience from early March.
It's also routinely failing the car wash question across all models now, which wasn't the case a month ago. :-/
Seeing some things about how the effort selector isn't working as intended necessarily and the model is regressing in other ways: over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take, but quoted in human effort, or suggesting the "easier" path forward even if it's a hack or kludge-filled solution.
> over-emphasizing how "difficult" a problem is to solve and choosing to avoid it because of the "time" it would take
I heard a while back Claude refused to attempt a task for days, saying it would take weeks of work. Eventually the user convinced it to try, and it one-shotted it in 30 seconds.
For days? Someone spent days trying to convince Claude to do something?
If you asked yesterday, and asked again today, then you asked for days. OP might be trying to express that it wasn’t just a temporary fluke.
>“idgaf about risk you coward, waste some time just do it and stop bitching”
The above was a successful prompt to get Claude to stop whining about effort, difficulty, and time.
Unfortunately abusive language well placed is an effective LLM motivator.
Awesome, I didn't know about the car wash question.
Totally true, also tokens seem to burn through much faster. More parallelism could explain some of it but where I could work on 3-5 projects at once on the max plan a month ago, I can't even get one to completion now on the same Opus model before the 5h session locks me up..
There is a chef, he opens a restaurant. Delicious food.
It costs him more in ingredients alone than he charges. He even offers some pseudo unlimited buffet, combo sets, and happy hours.
He announced a new restaurant, apparently it will be even better, so good he's a bit worried. He makes sure to share his worries while he picks a few select enterprise for business parties and the likes.
In the meantime he cracks down on free buffet goers who happen to eat too much, and downgrades all ingredients without notice to finally hope to make a profit.
This is close, but the real problem isn’t that the food is underpriced, it’s that the supply of ingredients is severely limited.
Pretty much capitalism in a nut shell, yeah.
On slightly off topic note: Codex is absolutely fantastic right now. I'm constantly in awe since switching from Claude a week ago.
I'm currently "working" on a toy 3d Vulkan Physx thingy. It has a simple raycast vehicle and I'm trying to replace it with the PhysX5 built in one (https://nvidia-omniverse.github.io/PhysX/physx/5.6.1/docs/Ve...)
I point it to example snippets and webdocumentation but the code it gens won't work at all, not even close
Opus4.6 is a tiny bit less wrong than Codex 5.4 xhigh, but still pretty useless.
So, after reading all the success stories here and everywhere, I'm wondering if I'm holding it wrong or if it just can't solve everything yet.
While I’ve had tremendous success with Golang projects and Typescript Web Apps, when I tried to use Metal Mesh Shaders in January, both Codex and Claude both had issues getting it right.
That sort of GPU code has a lot of concepts and machinery, it’s not just a syntax to express, and everything has to be just right or you will get a blank screen. I also use them differently than most examples; I use it for data viz (turning data into meshes) and most samples are about level of detail. So a double whammy.
But once I pointed either LLM at my own previous work — the code from months of my prior personal exploration and battles for understanding, then they both worked much better. Not great, but we could make progress.
I also needed to make more mini-harnesses / scaffolds for it to work through; in other words isolating its focus, kind of like test-driven development.
My impression is that it always comes down to how well what you’re trying to do pattern-matches the training set.
When it comes to agents like codex and CC it seems to come down to how well you can describe what you want to do, and how well you can steer it to create its own harness to troubleshoot/design properly. Once you have that down, I haven't found a lot of things you cannot do.
Breaking down and describing things in sufficient detail can be one way to ensure that the LLM can match it to its implicit knowledge. It still depends on what you’re trying to do in how much detail you have to spell out things to the LLM. It’s almost a tautology that there’s always some level of description that the LLM will be able to take up.
Well, not just breaking down the task at hand, but also how you instruct it to do any work. Just saying "Do X" will give you very different results from "Do X, ensure Y, then verify with Z", regardless of what tasks you're asking it to do.
That's also how you can get the LLM to do stuff outside of the training data in a reasonably good way, by not just including the _what_ in the prompt, but also the _how_.
" or if it just can't solve everything yet."
Obviously it cannot. But if you give the AI enough hints, clear spec, clear documentation and remove all distracting information, it can solve most problems.
It works somewhat well with trivial things. That's where most of these success stories are coming from.
Most of the folks are building CRUD apps with AI and that works fine.
What you're doing is more specialized and these models are useless there. It's not intelligence.
Another NFT/Crypto era is upon us so no you're not holding it wrong.
This is pretty wrong. Anyone who thinks this stuff is similar to NFTs and crypto hasn’t been paying attention.
I have also switched from claude to codex a few weeks ago. After deciding to let agents only do focused work I needed less context, and the work was easier to review. Then I realized codex can deliver the same quality, and it's paid through my subscription instead of per token.
Codex has been good quality wise, but I hit limits on the Codex team subscription so quickly it's almost more hassle that it is worth.
I made this switch months ago, ChatGPT 5.4 being a smarter model, but I’ve had subjective feelings of degradation even on 5.4 lately. There’s a lot of growth in usage right now so not sure what kind of optimizations their doing at both companies
I use Codex at home and Opus at work. They're both brilliant.
I would switch to Codex, but Altman is such a naked sociopath and OpenAI so devoid of ethical business practices that I can't in good conscience. I'm not under any illusion that Anthropic is ethical, but it is so far a step up from OpenAI.
Enemy centered decision making
Cannot you use Codex (which is open source, unlike Claude Code) with Claude, even via Amazon Bedrock?
Codex with Anthrophic's models is not as good as using the models with the harness it was trained in mine for. Same goes vice-versa too.
I'm with you on the ethical part, but everything is a spectrum. All the AI leadership are some shade of evil. There's no way the product would be effective if they weren't. I don't like that Sam Altman is a lunatic, but frankly they all are. I also recognize that these are massive companies filled with non shitty engineers who are actually responsible for a lot of the magic. Conflating one charlatan with the rest of it is a tragedy of nuance.
Yeah, but there's distinct difference between "risks their company because they refuse to help with killing little kids" and "happily helping with genocide".
One of these is better.
This coincides with Anthropic's peak-hour announcement (March 26th). Could the throttling be partly a response to infrastructure load that was itself inflated by the TTL regression?
It would be too fucking funny if this were the case. They're vibe coding their infrastructure and they vibe coded their response to the increased load.
You'd think they would have dashboards for all of this stuff, to easily notice any change in metrics and be able to track down which release was responsible for it.
They probably do, then they pipe it into a bunch of Claude subagents and then you get the current mess.
Just give us the option to get the quality back, Anthropic. I get that even a $200 subscription is not possible eventually, but give us the option to sub the $1000 tier or tell us to use the API tier, but give us some consistency.
This. I get much more value than 90€ from my Claude Code subscription. I am willing to pay more for consistency and not having to watch my back all the time, because I might get screwed over.
From the recent-ish Dwarkesh podcast, Anthropic seems to be wary about buying/building too much compute [0]. That probably means that they have to attempt to minimize compute usage when there is a surge in demand. Following the argument in the podcast, throwing more money after them, as some in this thread are suggesting, won’t solve the issue, at least not in the short term.
[0] https://www.dwarkesh.com/i/187852154/004620-if-agi-is-immine...
So, this especially bites if your validation step (let’s say integration tests) take 1hr plus. The harness is just waiting, prefix caching should happily resume things with just a minor new prefill chunk of output from the harness, and bam - completely new prefill.
I think they changed the quantification to save computer power for their new model. This might be why the benchmark scores look good, but the real world performance is much worse. I'm wondering if they're testing the model internally and didn't find anything wrong with the new parameter.
I canceled my subscription and switched to a codex, but it's not as good. I'm tired of Anthropic changing things all the time. I use Claude because it doesn't redirect you to a different model like OpenAI does. But now it seems like both companies are doing the same thing in different way.
Claude is worse, they don't tell you when your experience has degraded and don't even let you use worse models if you run out any.
Lately I am finding myself doing more and more of what I called "ambient coding" so that I am not directly using anymore all of those coding harnesses.
https://redbeardlab.gitbook.io/acem/essays/ambient-developme...
I basically wrote a small GitHub app and I simply create a GitHub issue, the bot read it, run an LLM loop and come up with a PR (or a design)
Then I simply approve the pr (or the design)
I find it much calmer and much more productive
If youre reading this claude, people are willing to pay extra if you want to make more money, just please stop doing this undermining, it devreases the trust of your platform to something that cannot be relied on
It looks like selling reputation to save money.
But more likely they are constrained on GPUs and can't get them fast enough.
(My guess having no understanding of how this industry actually works.)
As a Pro user, even though these issues and bugs are “new,” the downgrade has been noticeable since January. I’ve unsubscribed because the Pro plan is no longer usable for me.
It’s only making the news now because it’s affecting Max users as well ($100/$200 plans). I understand the need for change, but having zero communication about it is just wrong.
could it be that anthropic is experiencing a massive shortage of compute capacity, and is desperately trying to find means to overcome it ?
All the news i hear about this company for the past weeks made it sound like they're really desperate.
I also noticed this, just resuming something eats up your entire session. The past two weeks also felt like a substantial downgrade and made me regret renewing my subscription, it sucks because I wish I kept my Codex subscription instead and renewed that.
It's absolutely ridiculous how stupid Claude is now. I sometimes notice it and last year too but it feels like it's just last year before December model.
Feels similar to Claude last August/September. Knowing Claude some Agent probably reverted the fix from back then ^^
https://www.anthropic.com/engineering/a-postmortem-of-three-...
Since I (until Anthropic decided to remove access for subs) used Anthropic models extensively with pi I explored the two caching options and the much higher cost of 1h caches is almost never a good tradeoff.
Since the caching really primarily is something they can be judged at scale from across many users I can only assume that Anthropic looked at their infra load and impact and made a very intentional change.
Well, how entirely expected. The money man comes to collect and they are squeezing for money
Anthropic is leaving so much evidence around… proving damages and a pattern is becoming trivial
Am I the only one who sees striking parallels between being a Claude Code customer and Cuckoldry (as in biology)?
I mean, you are investing a lot (infrastructure and capital) into something that is essentially not yours. You claim credit for the offspring (the solution) simply because it resides in your workspace. You accept foreign code to make your project appear more successful and populated than you could manage alone. Your over-reliance on a surrogate for the heavy lifting leads to the loss of your own survival skills (coding and debugging). Last but not least, you handle the grunt work of territory defense (clients and environments) while the AI performs the actual act of creation (Displaced Agency).
What you're looking for is "vendor lock-in".
No, but it's very funny, I'm gonna call people that offshore their thinking to LLM "AI cucks" now
I noticed another limitation: "An image in the conversation exceeds the dimension limit for many-image requests (2000px). Start a new session with fewer images."
So I can't continue my claude code session I started yesterday.
This is the same shit openAI used to do last year, quietly downgrading their offerings while hyping the next big thing. I thought Anthropic were different but it seems they're playing the exact same long con with Mythos.
They can't really revolutionize AI again so they make the product worse and worse and then offer you a "better" one
There’s a case for intelligent caching: coarse grained 1h and 5min type TTls are not optimal.
Caching LLM is not like caching normal content; the longer it is the more beneficial it is and it only stops being worth when user stops current session.
So you'd need some adaptive algorithm to decide when to keep caching and when to purge it whole, possibly on client side, but if you give client the control, people will make it use most cache possible just to chase diminishing returns. So fine grained control here isn't all that easy; other possible option is just to have cache size per account and then intelligently purge it instead of relying just on TTL
keep in mind, efficient KV caching needs to be next to the GPU, so you sls need you HA to keep routing the user to the same hardware.
the hardware VM model is almost identical. Each session can go anywhere to start but a live session cant just be routed anywhere without penalty.
One of the largest AI companies on Earth cannot figure out an algorithm for when not to drop caches in long-running sessions?
AGI finding bugs again. Actual Guys/Gals Instead.
Changing "regression" to "Anthropic silently downgraded" sensationalizes the story
Why the FUD?
I notice some interesting public opinion weather change since Anthropic passed OpenAI wrt revenue
From the response in the linked issue:
>> Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.
The entire issue lays out how and why it's a silent downgrade. Also silent because it just happened, without announcing.
I don't understand how is this FUD?