The spend at my organization has reached beyond the $200,000 per month level on Anthropic's enterprise tier.
The amount of outages we have had over these past few months are astounding and coupled with their horrendous support it has our executive team furious.
its alot of money to be spending for a single 9 of reliablility.
If you are paying API rates (not using Max subscriptions) there's no reason to use Anthropic's API directly, the same models are hosted by both AWS and Google with better uptime than Anthropic.
How do things like prompt caching etc play into that? Would I theoretically have a more stable harness backing my usage?
Im seriously over the current claude experience. After seemingly fixing my 4.6 usage by disabling adaptive thinking and moving to max effort, it seems that the release of 4.7 has broken that workflow and Im 99% certain that disabling adaptive thinking does nothing even on 4.6 now. Just egregious errors in 2 days this week after coming back from vacation.
Im looking at moving to Pi and I like the minimal nature, but I disagree with a handful of decisions they make. So Id likely need to maintain a fork which is less than ideal.
What decisions is Mario making that you disagree with? My impression is Pi is minimal so any changes can live on top of Pi without needing to maintain a fork?
I started developing my own coding agent after using Pi for a couple months, so I’m curious what you don’t like about pi.
When I hear Mario talk about pi and his approach I find myself agreeing with a lot of it. But I also find myself agreeing with a lot of the points from this https://www.thevinter.com/blog/bad-vibes-from-pi
the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting. Bold choices.
To save others a click, though the article is worth reading.
He also mentions no subagents by default in pi as well.
pi for the win, i have my own ai extend it when i want more specific features. vibe coded in 20 minutes shift+tab like claude code to add permission control.
I find it so funny that many of these harnesses sound like black magic and are completely mystical to me. I use Claude Code every day and yet i can't imagine the workflow of Pi. I also don't care to pay API rates just to experiment with them.
Largely though i'm happy with Claude Code w\ IDE integration, so i don't feel the need to migrate. Nonetheless i'm curious.
Obviously there is only so much you can say; but is that $200K due to the raw number of seats you have, or are you burning through a lot on raw API usage? I guess I'm trying to understand, large business, or large usage.
we are in the SMB space, the spend is almost entirely usage for us at this point, rather than seat cost.
For context, we are a software firm focused on difficult engineering problems, but I cant divulge much else.
Have you guys considered running your own local models? 200k a month is a ton of money and puts all your eggs in one basket. Or is it easier to just be able to run away from it all if you are done with it or something changes?
I led the team that did the math and analysis for determining our direction in selecting Anthropic.
We initially assumed this was where we would end up, but after some investment exploring our options we found it not worth the trouble.
Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system.
We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security.
It was possible for us to invest in this, but it would require additional investment in hiring or training to get us to a state comparable to the hosted options.
Eventually, I had to recommend against the project as it was more likely to be an investment in the leading team's resume, than an actual investment into our organization.
To start, I want to be clear I am trying to understand not criticizing, and mistakes are how institutional knowledge grows.
Your last paragraph hints at retention struggles which complicates the issue.
But was vendor mitigation not part of the evaluation? I get that most companies view governance and compliance as a pay to play issue, but there has always been an issue with rapidly changing areas and single source suppliers.
I admit to having my own preferences and being almost completely ignorant about what your needs are, but I have seen the value in having a rabbit to pull out of the hat.
If employee retention doesn’t allow for departure of individuals without complete loss of institutional knowledge I guess my position wouldn’t hold.
But during the rise of cloud computing I introduced an openstack install in our sandbox, not because I thought that we would stay on a private cloud but because it allowed our team to pull back the covers and understand what our cloud vendor was doing.
It was an adoption accelerator that enabled us to choose a vendor that was appropriate and to avoid the long tail of implementation.
I was valuable as a pivot when AMD killed seamicro with short notice, and the full cloud migration period was dramatically shortened.
I have a dozen other examples, but it is like stock options, volatility and uncertainty dramatically increase the value of keeping your options open.
We will have vendors fold, and a single source only story couples you org to the success of that vendor.
IMHO There is a huge difference between tying your success to an Oracle, who may be ‘safe’ if expensive as a captive customer and doing the same in uncertain markets.
it's an SMB, if you need redundancy on every 3rd party dependency your business will die anyway
better to take the risk for most things. if the worst case happens and you have to migrate, you migrate. otherwise you risk overengineering upfront and guaranteeing reduced productivity rather than risking it
> Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security.
That's not local models vs hosted models, that's using the enterprise services from Anthropic. Any local LLM inference engine such as VLLM gives you an OpenAI compatible API with the exact same features as a hosted model.
I'm not sure what your use case is, but I personally found Anthropic's offerings lacking and inferior to open source or custom-built solutions. I have yet to see any "memory" system that's better than markdown files or search, and harnesses for agentic AIs are dime a dozen.
I don't blame you. I personally would consider revisiting it in the next month or so. A lot of people are saying some of these smaller models like qwen 3.6 are basically at Claude sonnet performance if not better.
That level of hardware, if the performance was enough is a much smaller investment and gamble.
Either way I understand the decision. Your product isn't in locally hosted LLMs, why fuss. That said I see 1 million plus in external spend I start wondering about the options. Not saying you did the wrong thing, I think you did the right thing but things seem to be changing on the local model front and quite rapidly.
Some of the local models are effectively there. It depends on what scale you need or want. Kimi 2.6 is up there with opus, granted it's huge. On some benches it's actually better. Qwen3.6 is up there with sonnet but it's nearly microscopic. A lot has changed in the last month
Only if you're vibe coding, with ambiguous prompts that require the model to fill in a huge number of gaps and basically write the software for you.
The people who don't really know what they're doing (or don't care) need the full power of the SOTA models, those with experience can provide enough context and instruction to make even small local models work.
Some of the latest batch are more vibe code friendly even. It's pretty crazy. People are few shotting small toy games and stuff with qwen3.6. I'm personally not into that work flow but yea. It won't be long until the efficiency wave hits and small models are really all people need
GitHub, along with MSFT in general, have massive copilot mandates where workers are being shamed into using slop tools to fix serious on-going issues. GitHub seems wholly incapable of resolving their issues: money isn't a problem, talent isn't a problem, but business leadership is definitely a major problem.
Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.
Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.
None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.
We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.
(We also fixed a number of problems around configuration that would roll out globally too fast, leaving no time to notice errors and stop a bad rollout, as well as cases where services being down actually made it hard to revert the change... should be in a much better place now. But again, none of that had to do with LLMs.)
> None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.
Is that true? At least one of them seemed to involve LLM-written code from what I saw. (Not to say that human error wasn't _also_ a contributing factor, but I wouldn't say it had _nothing_ to do with LLMs).
> We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.
The reviewer is decent, but the false positive rate is substantial, and the false negative rate is definitely nonzero. Not that you would know that the way our genius CTO talks about it...
> Not that you would know that the way our genius CTO talks about it...
Honestly I find it bizarre that there are people at Cloudflare who have this attitude. Without Dane, the company wouldn't be half the size it is today.
Something unexpected that LLMs robbed from us is to receive the grace of assuming we failed on our own e.g. good ol' fashioned human/organizational failure.
Speaking of developer tooling spend - IDEs are far harder to build such as JetBrain etc and don't think any IDE would be charging this amount to any customer per month.
Not sure how much of a productivity gain a 2.5 million per year it is?
Run Facebook on a single Proxmox box and demand would still outstrip the supply.
What yet needs to be seen is if that demand sustains in the long run at that price point or flattens out proving to be super elastic given that there are many other providers that are catching up pretty fast.
Yeah, I feel like all of the bad downtimes happen during American business hours. We use GitHub at work in Europe and I don't remember it ever being down or broken between 0700 and 1700 local time.
That’s statistically just luck then - plenty of outages this year already in Berlin time during work hours - I do remember the forced breaks with colleagues for sure.
Our expense is roughly around 12.3 software developers when you break it down across all people related expenses. But we've spent alot of time and energy prior to this focusing on our ability to measure our software development output across multiple teams.
The delivery improvements are not evenly applied across all teams, but the increases that we have seen suggest a better ROI than if we had hired 12 developers.
It's genuinely hilarious how the same leadership pushing for RTO because getting people together creates magic, seems to have no issues trading those same people out for LLM's churning at specs.
Respectfully,
After a certain level of compensation, you are indeed judged purely off of input and output.
Workplace improvement does not justify your salary.
You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them.
Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.
> Respectfully, After a certain level of compensation, you are indeed judged purely off of input and output. Workplace improvement does not justify your salary.
I'd have to disagree. There's a narrow band in the middle where that's true, but once you exceed that, your personal inputs and outputs matter less and less, and the contributions you make to the overall workplace, and how well you enable those around you, make a larger part of why you're compensated.
Even as an IC, the more you're able to mentor and elevate the people around you, the more your compensation will grow (if you're in the right place, and thus already at the right earnings bracket)
I would agree if the team im on were still growing/scaling.
However we are well past our scaling phase, and at this point our concern is maintaining multi-million dollar contracts with a tight well-compensated team.
What local alternative could replace your Anthropic use? I have found none. I don't think many have, which is why most of us pay Anthropic, rather than using one of the numerous, far cheaper, cloud services that host "local" class models.
Most of us are paying for access to proprietary SOTA models, rather than hosting.
I think there is alot of baseless fury behind your words, but my regular interactions with my leadership dont lead me to think they have the end goal of replacing labor.
We're blessed to have leadership with technical backgrounds, so the tools are regarded more as significant intelligence enhancers of already exceptionally smart engineers, rather than replacements.
Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.
Throwing bodies at a problem doesn't always scale.
There are many difficult problems that do not get easier by throwing more juniors or mid level engineers at them.
> the increases that we have seen suggest a better ROI than if we had hired 12 developers.
You can’t argue “we were able to get away with not hiring more developers” and also say you aren’t replacing labor.
Morally I trend towards your side of things, but it’s also important to be realistic about what you’re actually doing. Money is going towards Anthropic and not towards new hires. That’s a replacement of labor. It doesn’t matter what the end goal was.
I’m glad your leadership isn’t trying to fire everyone. But in case you live under a rock, tech layoffs are at all time highs. Companies are rewarded by the public markets for laying off workers.
Simultaneously we have AI industry leaders warning of an employment apocalypse once AGI is achieved.
They must have hired absolutely incompetent leaders on the core software and infrastructure side. Sure their AI research is great but it’s amateur hour. Or just vibe coded slop top to bottom. It seems like every single day people are talking about outages or billing issues or secret changes to how Claude works.
If 90% is one nine and 99% is two nines, we can use the logarithm to compute how many fractional nines we have at 98.59%: about 1.9788 nines (almost two!)
Just another Mythos breakout. Excuse us while we airgap the affected DC and send in a team to drive framing nails into every storage device in the building.
As a long-term 20x user, Claude has recently felt a lot like using AI for coding a year or so ago.
It can't reliably handle basic tasks. I ask for something straightforward and get something subtly wrong, incomplete, or just not workable. I always use the best model available and effort levels maxed, but with all their changes I have to relearn how to make the model perform at best every day, and it seems I can't keep up.
It’s not that Claude can’t do impressive things, it clearly can, but the inconsistency on simple, expected behavior makes it hard to use. The downtime is annoying but hasn't been the deciding factor.
I’m not waiting it out this time. I’m switching over to Codex, and based on my usage today it looks like I’ll be fine on the 5x plan, so I can drop down and save about $100 a month which is nice. I didn't quite have a grasp on how quickly companies can change for better or worse until Anthropic showed me. I'm surprised at how quickly they brought me from a happily paying max user to not even wanting the lowest paid tiers.
The inconsistency has always been there you’re just noticing it more over time and the models are not really improving at real work in spite of all the new releases and churn.
Truly! As someone who's worked with HPC and GPUs in a scientific research context, trying to get a service like this to work reliably is a different ballgame to your usual webapp stack...
It would be "unlimited budget" if they were a monopoly, but they're in a bidding war with three other "unlimited" budget AI companies, over a resource no one expected to be scarce. There's simply not enough supply to meet demand, no matter how much money you have
I think you have to see this as a bunch of stateless requests, and this makes the problem way easier.
LLM requests that do not call tools do not need anything external by definition.
No central server, nothing, they can even survive without the context cache.
All you need is to load (and only once!) the read-only immutable model weights from a S3-like source on startup.
If it takes 4 servers to process a request, then you can group them 4 by 4, and then send a request to each group (sharding).
Copy-paste the exact same-setup XXX times and there you have your highly-parallelizable service (until you run out of money).
It's very doable, any serious SRE can find a way setup "larger than one card" models like Kimi or DeepSeek (unquantized) if they have a tightly-coupled HPC (or a pair of very very beefy servers).
If you run out of servers, then again a money problem, but not an architectural problem (and modern datacenters are already scalable).
Take the best SRE, but no budget, and there is no solution.
So inference is the easy part.
Codex or Claude Code if it takes lot of time or have slow cold latency, it's considered very acceptable.
Some users would probably not even see the difference if a request takes 2 minutes versus 3 minutes.
The real difficult part is to have context caching and external tools, because now you are depending on services that might be lagging.
Executing code, browsing the web, all of that is tricky to scale because they are very unreliable (tends to timeout, requires large cache of web pages, circumventing captchas, etc).
These are traditional scaling problems, but they are more difficult because all these pieces are fragile and queues can snowball easily.
Yeah, and totally missed RAI part, billing, model deployment, security patches, rate-limiting, caching, dead GPUs, metrics, multiple regions, gov clouds, gdpr(or data locality issues), monitoring, alerting and god knows what else while at extreme loads.
GDPR doesn’t affect load, dead GPUs are no different than any software freeze, model is a file update, metrics are already scaling very well and even way way way bigger and they are very linear, security updates are hedged with gradual rollouts, canary, feature flags, etc.
From an ops perspective all of these things are already really well solved issues in a very scalable manner, because plenty of companies had to solve these issues before.
It’s even better here because you can throw millions in salaries to “steal” the insider info on how their production actually.
No doubt it is fast-paced but the complexity to go from 100k GPUs to 1M is much lower than from going from 1k to 10k GPUs.
All 3 big AI companies had the luxury that during the scaling phase they could do everything directly on production servers.
This is because customers were very very tolerant, and are still quite tolerant.
You can even set limits of requests to large users and shape the traffic.
Cloudflare in comparison, high-scale, low-latency, end users not tolerant at all to downtime, customers even less tolerant, clearly hostile actors that actively try to make your systems down, limited budget, a lot of different workloads, etc.
So, for LLM companies where you have to scale a single workload, largely from mostly free users, and where most paid customers can be throttled and nobody is going to complain because nobody knows what are the limits + a lot of tolerance to high-latency and even downtimes then you are very lucky.
Would have thought that compared to training the serving part is pretty easy. Less of a “everything needs to come together at once” and more just move demand to a working cluster if one bombs & have some spare capacity
Hug ops to everyone involved in these outages and trying to maintain uptime.
But glad my team is staying nimble and has multi-model (Anthropic, Codex, Gemini), multi-modal (desktop, CLI/TUI, web) dev tooling.
As our actual coding skills collectively atrophy, we'll either need to switch tools or go for a walk when the LLM is down.
In the cloud era I advised against a multi-cloud strategy, as the effort to impact just wasn't there. But perhaps this is different in the LLM era, where the cost of switching is pretty darn low.
Tbh, even if your code skills don’t atrophy, you can still use outage events like this or AWS being down etc to just make up an excuse to go for a walk.
If this can happen to Anthropic, imagine all the companies building on top of Claude Code for live products. Hopefully the industry is learning that competent problem solving human engineers are still very much needed when you have increasingly deceptive non-deterministic genies running your production stack.
The fact API is available, does not mean you will actually get the model it states you get. Today Opus 4.7 was noticeably dumber than yesterday. It performed worse than my local Qwen.
Presumably you'd buy really beefy laptops. The price delta between buying the most basic MacBook Pro possible (14", M5, 16 GB unified memory, 1 TB SSD) and one with the M5 Max with 40 GPU cores, 128 GB unified memory, 2 TB SSD is $3400. How much Claude usage does that get you/in what time does it pay itself back?
That doesn't get you any Claude usage. Claude models obviously aren't open, but equivalent models to Opus take about 400GB of memory to run.
You can run the versions with fewer parameters or quantized weights but depending on how much quality you're sacrificing, now you'd have to compare the price against cheaper Claude models like Sonnet.
At least if its unavailable Claude Code can't churn through an entire session limit in 30 minutes, looping, produce nothing (but noted it found a whole bunch of problems), and then when asked to just fix what it found, forget and start again. I honestly can't find anything it's good at anymore, even really simple problems a child could solve. Giving Codex a much more complex task, it not only identified it within a couple of minutes, it produced targeted tests and kept iterating unattended until it figured it out without any help, instead of idiot synonyms for thinking...
I can't even send them an angry message because clicking "Get help" does nothing.
We've been running our 10 dev org on 8 H100s on open models (with some tweaks). Sure they aren't as good as the big providers but they 1. don't go down 2. have pretty damn high tok/s. It pays for itself.
Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.
It was pretty hard to justify the purchase to the board but we got a decent deal from a nearby data-center (~15% discount). Thankfully, it's fixed cost, its an asset we can use for our taxes, and it will survive for years to come. The only thing we have to work on is maintenance as well as looking into some renewable energy options.
We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.
It's fine! There's no world where individuals can buy this kind of stuff. Our company is too small to do it, but I'd love for there to be a public utility of sorts for being able to use LLMs. It is absurd that only these >$1T companies are allowed to run this. I also find it dangerous for society to have so much power and wealth concentrated there too.
Anyway, this is the internet and skepticism is warranted :D.
Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.
The best deal is arguably to buy as much on prem inference as you can reasonably expect to use by running the hardware around the clock, even at slower throughput, and use 3rd-party inference for things that are genuinely latency-sensitive. I just don't see how this resolves to needing a half-dozen V100, surely you're not using that much compute? You don't need to place your entire model on GPU, engines for on prem inference generally support CPU/RAM-based offload.
We're planning to do the same thing - buy something like 8xH100 and run all coding there. The CTO almost agreed to find the budget for it but I need to make sure there are no risks before we buy (i.e. it's a viable/usable setup for professional AI-assisted coding)
Can you share what models you run and find best performing for this setup? That would help a lot. I already run a smaller AI server in the office but only 32b models fit there. I already have experience optimizing inference, I'm just interested what models you think are great for 8xH100 for coding, I'll figure out the details how to fit it :)
8 x h100 80's don't give you enough to run the latest 1tn + parameter models (especially at the context window lengths to be competitive with the frontier models)
Check out Verda you can rent whatever super powerful GPU clusters you need in 10 minute increments. Deploy any open weight model using SGLang and away you go
I’ve been using kimi 2.5/2.6 for the past 2 weeks and it’s really not far off OpenAI and Claude models. I am a coder so it’s not all vibes but I am definitely more in the “spec to code” mode than “edit this file for me” and it copes just fine. Needs a bit more supervision than the frontier models but it’s also significantly cheaper. If I were anthropic I’d be shitting myself, their prices are going to 10x over the next 2 years
If you haven't done so already, finetune the model on all your company's code that you can get your hands on. This is one of the great advantages that you get when running local models. I like the style of the generated code much better now, I have to rewrite much less, and my prompts can be shorter too. But maybe these already are the "tweaks" that you mentioned.
How would they do that? Would it be as easy as telling a model "Hey, review all this code, identify patterns, and then write in this style going forward"?
Sorry if this is a stupid question, I've never finetuned or trained a LLM.
Glad I started using the desktop app which is still working. Gotta say though, all of these difficulties with Claude are making me nervous as I use it a lot for work and really don't like ChatGPT/OpenAI for functional and personal reasons. Zo Computer has been my main fallback when Claude is failing, I'll use one of their many models temporarily within Zo's interface.
I have been keeping an eye on the outages. This is why I am looking more deeply into what I can do with self-hosted models. When I see people who want to build products on top of these services I can't help but think that people are mad. We're still a long way from these services being anywhere near stable enough for use in a product you'd want to sell someone.
> We are continuing to work to resolve the issues preventing users from accessing Claude.ai, and causing elevated authentication errors for requests to the API and Claude Code.
What are you doing with the authentication servers? This isn't the first downtime I've seen caused by that.
You're absolutely right! AI could be very helpful in this situation!
Oh no wait... the outage is with out AI itself, so how can AI help? Allow me to re-evaluate.
Fublutenuating...
Yes, let's ask AI!
Oh no wait... the outage is with AI itself, I already correctly identified this above.
Bubbluating...
It seems you will have to rely on your engineering skills to solve this problem yourself, ie, you're cooked! I will auto-renew your subscription to ensure you can be sure you'll have access to AI to solve this problem if it ever comes back online.
Large telcos often have a chunk of subscriptions with their biggest competitor so that when they absolutely explode and everything is down, they can still communicate to bring it back up.
Clearly, half of Anthropic should have subscriptions to OpenAI or Mistral or whatever China sells.
same boat, smaller scale. been hitting overloaded errors sporadically for the past week. switched one of my pipelines to the AWS Bedrock endpoint and it's been solid. not a permanent fix but good enough to keep moving.
Gemini seems to have a lot as well (at least through Antigravity.Google -> constant errors, not enough capacity, super slow replies until it times out, etc)
Have you run a system in production? There are a multitude of reasons that a system can go down. There's no indication so far from Anthropic that this was merely compute limitations.
> There are a multitude of reasons that a system can go down.
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
I have no inside knowledge of Anthropic. But having done a lot of postmortems in general, one of the key dynamics that routinely comes up is "we know we keep shipping breakages, and we know these new procedures would prevent many of them, but then we wouldn't be able to deliver new stuff so quickly". Given where Anthropic is at and what they believe about the future of software development, that's a tradeoff that they may very well be intentionally not making.
Yeah, this is not just inference. First thing for me was an MCP I use went down in Claude Code, models still worked. Now "API Error: 529 Authentication service is temporarily unavailable."
as an anecdote of support for yaw terminal i am currently logged in via Yaw Mode and have been continuing to use claude all day no problems while the browser is absolutely unavailable.
"We are investigating an issue preventing users from reaching Claude.ai, and will provide an update as soon as possible."
Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
All it took for Codex to resume a stalled Claude Code session:
> I'm working with Claude Code on session aaaaaaaa-bbbb-1223-3445-abcdefabcdef which I'd like to hand-off to you, do you know how to read the session, my input and Claude's output so we can resume where I left off?
gpt-5.5, medium effort. "Resumed" session fully in under 2 minutes. Outages like today's are so common that I've now got the time to re-evaluate Codex every other day.
I haven't used claude in a week (after being a heavy user) and if you have ever seen the movie office space where Peter enters his stage of ecstasy that's what life feels like right now.
Today Opus 3.7 was completely unusable. I'd say performance was worse than my local Qwen. I have a feeling they are not actually routing to the Opus 4.7 most of the time, but to cheaper and less complex models. I think regulators should look into that.
The uptime with Claude is poor. I use it for workflows more or less 24/7. It is often unreliable. Fine, it is cheap. What I really dislike is the uneven quality of the service. Clearly it does NOT work as stated. Opus 4.7 sometimes give ancient code back. Just the other day it even stated that the latest version of Opus was 4.5 and 4.x something for ChatGPT.
It's rare in history that a software product can be so unreliable without any negative business impact because it's the category leader and demand only keeps growing.
Reminds me of the early days of World of Warcraft, when servers went down frequently because Blizzard couldn't keep up with all the load. Everyone was frustrated but of course nobody stopped playing.
I'm experimenting with a simple ritual: if Claude is out, I'm out.
I'll just go for a walk outside.
And I don't mean "if I can't access Claude to do my work", I mean, just in general - I'll just ping claude.ai from time to time and use Claude's breaks as a break reminder.
So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.
We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.
Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?
My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
They can't fix it because the thing that they need to fix it is the thing that doesn't work. /s
But seriously: while I don't use Claude, this issue of perceived unreliability seems to be approaching the point of existential risk for Anthropic. Whats the theory about why they're struggling? Compute capacity? Load? Lack of focus on SRE?
Put it another way: is their downtime due to something fundamental about serving inference, or just bad engineering choices? Given their resources, it seems astonishing.
The spend at my organization has reached beyond the $200,000 per month level on Anthropic's enterprise tier. The amount of outages we have had over these past few months are astounding and coupled with their horrendous support it has our executive team furious.
its alot of money to be spending for a single 9 of reliablility.
If you are paying API rates (not using Max subscriptions) there's no reason to use Anthropic's API directly, the same models are hosted by both AWS and Google with better uptime than Anthropic.
How do things like prompt caching etc play into that? Would I theoretically have a more stable harness backing my usage?
Im seriously over the current claude experience. After seemingly fixing my 4.6 usage by disabling adaptive thinking and moving to max effort, it seems that the release of 4.7 has broken that workflow and Im 99% certain that disabling adaptive thinking does nothing even on 4.6 now. Just egregious errors in 2 days this week after coming back from vacation.
AWS Bedrock supports prompt caching, just note that if you use the Converse API you need to set the cache points manually.
> Would I theoretically have a more stable harness backing my usage?
If you don’t mind an opinionated harness that asks for a pretty specific workflow, but one that works well, use OpenCode.
If you want to spread your wings and feel the sweet kiss of freedom, use Pi.
Im looking at moving to Pi and I like the minimal nature, but I disagree with a handful of decisions they make. So Id likely need to maintain a fork which is less than ideal.
What decisions is Mario making that you disagree with? My impression is Pi is minimal so any changes can live on top of Pi without needing to maintain a fork?
I started developing my own coding agent after using Pi for a couple months, so I’m curious what you don’t like about pi.
When I hear Mario talk about pi and his approach I find myself agreeing with a lot of it. But I also find myself agreeing with a lot of the points from this https://www.thevinter.com/blog/bad-vibes-from-pi
the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting. Bold choices.
To save others a click, though the article is worth reading.
He also mentions no subagents by default in pi as well.
oh-my-pi harness fixes many of these, like subagents
check out my pi forks.
Ummmmmm, how?
I searched his HackerNews username on Google.
[0] - https://github.com/cartazio/oh-punkin-pi
pi for the win, i have my own ai extend it when i want more specific features. vibe coded in 20 minutes shift+tab like claude code to add permission control.
I find it so funny that many of these harnesses sound like black magic and are completely mystical to me. I use Claude Code every day and yet i can't imagine the workflow of Pi. I also don't care to pay API rates just to experiment with them.
Largely though i'm happy with Claude Code w\ IDE integration, so i don't feel the need to migrate. Nonetheless i'm curious.
you can use claude code with these other providers
The enterprise tier is API pricing only.
https://support.claude.com/en/articles/9797531-what-is-the-e...
Enterprise adds IAM, logging, and analytics, all of which AWS provides for free or for metered usage without needing an enterprise plan.
They'll cut you a private offer for bedrock tokens but bedrock has a 32k output limit
I use bedrock with 1M context every day. Not sure this is right
4.7 is the first opus model that’s had the 1 M context window available on Bedrock.
I've had Opus 4.6 1M and Sonnet 4.6 1M for months now on Bedrock.
Not true. Opus and Sonnet 4.6 support 1m context on Bedrock.
isnt that an input limit from api gateway?
Obviously there is only so much you can say; but is that $200K due to the raw number of seats you have, or are you burning through a lot on raw API usage? I guess I'm trying to understand, large business, or large usage.
we are in the SMB space, the spend is almost entirely usage for us at this point, rather than seat cost. For context, we are a software firm focused on difficult engineering problems, but I cant divulge much else.
Have you guys considered running your own local models? 200k a month is a ton of money and puts all your eggs in one basket. Or is it easier to just be able to run away from it all if you are done with it or something changes?
I led the team that did the math and analysis for determining our direction in selecting Anthropic. We initially assumed this was where we would end up, but after some investment exploring our options we found it not worth the trouble.
Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security. It was possible for us to invest in this, but it would require additional investment in hiring or training to get us to a state comparable to the hosted options.
Eventually, I had to recommend against the project as it was more likely to be an investment in the leading team's resume, than an actual investment into our organization.
To start, I want to be clear I am trying to understand not criticizing, and mistakes are how institutional knowledge grows.
Your last paragraph hints at retention struggles which complicates the issue.
But was vendor mitigation not part of the evaluation? I get that most companies view governance and compliance as a pay to play issue, but there has always been an issue with rapidly changing areas and single source suppliers.
I admit to having my own preferences and being almost completely ignorant about what your needs are, but I have seen the value in having a rabbit to pull out of the hat.
If employee retention doesn’t allow for departure of individuals without complete loss of institutional knowledge I guess my position wouldn’t hold.
But during the rise of cloud computing I introduced an openstack install in our sandbox, not because I thought that we would stay on a private cloud but because it allowed our team to pull back the covers and understand what our cloud vendor was doing.
It was an adoption accelerator that enabled us to choose a vendor that was appropriate and to avoid the long tail of implementation.
I was valuable as a pivot when AMD killed seamicro with short notice, and the full cloud migration period was dramatically shortened.
I have a dozen other examples, but it is like stock options, volatility and uncertainty dramatically increase the value of keeping your options open.
We will have vendors fold, and a single source only story couples you org to the success of that vendor.
IMHO There is a huge difference between tying your success to an Oracle, who may be ‘safe’ if expensive as a captive customer and doing the same in uncertain markets.
Would you be willing (or able) to share more?
it's an SMB, if you need redundancy on every 3rd party dependency your business will die anyway
better to take the risk for most things. if the worst case happens and you have to migrate, you migrate. otherwise you risk overengineering upfront and guaranteeing reduced productivity rather than risking it
> Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security.
That's not local models vs hosted models, that's using the enterprise services from Anthropic. Any local LLM inference engine such as VLLM gives you an OpenAI compatible API with the exact same features as a hosted model.
I'm not sure what your use case is, but I personally found Anthropic's offerings lacking and inferior to open source or custom-built solutions. I have yet to see any "memory" system that's better than markdown files or search, and harnesses for agentic AIs are dime a dozen.
I don't blame you. I personally would consider revisiting it in the next month or so. A lot of people are saying some of these smaller models like qwen 3.6 are basically at Claude sonnet performance if not better.
That level of hardware, if the performance was enough is a much smaller investment and gamble.
Either way I understand the decision. Your product isn't in locally hosted LLMs, why fuss. That said I see 1 million plus in external spend I start wondering about the options. Not saying you did the wrong thing, I think you did the right thing but things seem to be changing on the local model front and quite rapidly.
Local models perform objectively worse than SotA SaaS models. Your employees will hate this decision.
Some of the local models are effectively there. It depends on what scale you need or want. Kimi 2.6 is up there with opus, granted it's huge. On some benches it's actually better. Qwen3.6 is up there with sonnet but it's nearly microscopic. A lot has changed in the last month
Only if you're vibe coding, with ambiguous prompts that require the model to fill in a huge number of gaps and basically write the software for you.
The people who don't really know what they're doing (or don't care) need the full power of the SOTA models, those with experience can provide enough context and instruction to make even small local models work.
Some of the latest batch are more vibe code friendly even. It's pretty crazy. People are few shotting small toy games and stuff with qwen3.6. I'm personally not into that work flow but yea. It won't be long until the efficiency wave hits and small models are really all people need
A single nine so far. If github is any guide thing will get worse.
Why would Github be a guide? It's also terrible, but it's a radically different stack from an unrelated company
That, and even before AI, MS was having trouble with GH reliability
GitHub, along with MSFT in general, have massive copilot mandates where workers are being shamed into using slop tools to fix serious on-going issues. GitHub seems wholly incapable of resolving their issues: money isn't a problem, talent isn't a problem, but business leadership is definitely a major problem.
Look at how other companies are suffering massive outages due to LLMs too like AWS and Cloudflare. Two companies that use to be the best in the industry at uptime but have suddenly faltered quite quickly.
Companies that have even worse standards will quickly realize how problematic these tools are. Hopefully before a recession because this industry seems to be allergic to profitable businesses and leaders that have been around since ZIRP have shown zero intelligence in navigating these times.
None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.
We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.
(We also fixed a number of problems around configuration that would roll out globally too fast, leaving no time to notice errors and stop a bad rollout, as well as cases where services being down actually made it hard to revert the change... should be in a much better place now. But again, none of that had to do with LLMs.)
> None of the three major Cloudflare outages in the past six months had anything to do with LLMs. They were regular old human mistakes.
Is that true? At least one of them seemed to involve LLM-written code from what I saw. (Not to say that human error wasn't _also_ a contributing factor, but I wouldn't say it had _nothing_ to do with LLMs).
> We did, however, determine that at least one of them (and perhaps all) would have been easily caught by AI code reviewers, had AI code reviewers been in use. So now we mandate that. And honestly, I love it, the AI reviewer spots all sorts of things that humans would probably miss.
The reviewer is decent, but the false positive rate is substantial, and the false negative rate is definitely nonzero. Not that you would know that the way our genius CTO talks about it...
> Not that you would know that the way our genius CTO talks about it...
Honestly I find it bizarre that there are people at Cloudflare who have this attitude. Without Dane, the company wouldn't be half the size it is today.
Something unexpected that LLMs robbed from us is to receive the grace of assuming we failed on our own e.g. good ol' fashioned human/organizational failure.
Speaking of developer tooling spend - IDEs are far harder to build such as JetBrain etc and don't think any IDE would be charging this amount to any customer per month.
Not sure how much of a productivity gain a 2.5 million per year it is?
Supply and demand - if you think it’s not worth the price, take your dollars elsewhere.
This is the brutal reality; even with the crazy reliability issues, demand is still far outstripping supply at the current price.
Run Facebook on a single Proxmox box and demand would still outstrip the supply.
What yet needs to be seen is if that demand sustains in the long run at that price point or flattens out proving to be super elastic given that there are many other providers that are catching up pretty fast.
IDEs don't need expensive GPUs to create and serve.
> single 9 of reliability
Out of curiosity, do you actually use it 24/7? The world doesn't collapse every time o365 goes down... (which is also pretty often)
In my experience the downtime tends to coincide with peak PT timezones. If you're in PT, it's very inconvienent.
Yeah, I feel like all of the bad downtimes happen during American business hours. We use GitHub at work in Europe and I don't remember it ever being down or broken between 0700 and 1700 local time.
That’s statistically just luck then - plenty of outages this year already in Berlin time during work hours - I do remember the forced breaks with colleagues for sure.
if it's judged only by the time it is expected to be in use (work hours), reliability is likely even worse than the 24/7 measure.
We are spending the equivalent of 32 monthly software engineer salaries on Claude per month.
Info like this is useless without context like, how much revenue does the company earn? How many engineers do they employ? etc.
Our expense is roughly around 12.3 software developers when you break it down across all people related expenses. But we've spent alot of time and energy prior to this focusing on our ability to measure our software development output across multiple teams. The delivery improvements are not evenly applied across all teams, but the increases that we have seen suggest a better ROI than if we had hired 12 developers.
I guess if you think about your teammates as purely inputs and outputs and not people that can improve and contribute in the workplace in other ways.
It's genuinely hilarious how the same leadership pushing for RTO because getting people together creates magic, seems to have no issues trading those same people out for LLM's churning at specs.
Haha nail on head so the motive for ‘get your ass back in the office’ was never the motive we all heard
Respectfully, After a certain level of compensation, you are indeed judged purely off of input and output. Workplace improvement does not justify your salary.
You will also find that many problems in the harder sciences do not get easier by throwing more bodies at them. Comments like these remind me that some project managers think they'd be able to delivery a baby in 1 month if they simply had 9 women.
> Respectfully, After a certain level of compensation, you are indeed judged purely off of input and output. Workplace improvement does not justify your salary.
I'd have to disagree. There's a narrow band in the middle where that's true, but once you exceed that, your personal inputs and outputs matter less and less, and the contributions you make to the overall workplace, and how well you enable those around you, make a larger part of why you're compensated.
Even as an IC, the more you're able to mentor and elevate the people around you, the more your compensation will grow (if you're in the right place, and thus already at the right earnings bracket)
> you are indeed judged purely off of input and output
That's not how successful (software, in this case) teams are made.
I would agree if the team im on were still growing/scaling. However we are well past our scaling phase, and at this point our concern is maintaining multi-million dollar contracts with a tight well-compensated team.
Is it worth it?
He was fired before answering.
[but as his manager I can tell you:] YES !!!!
No, we can literally buy our own hardware for what we spend in a month and host our own local LLMs for company usage.
> and host our own local LLMs for company usage.
What local alternative could replace your Anthropic use? I have found none. I don't think many have, which is why most of us pay Anthropic, rather than using one of the numerous, far cheaper, cloud services that host "local" class models.
Most of us are paying for access to proprietary SOTA models, rather than hosting.
Five nines? No, nine fives
> has our executive team furious
And yet they will continue to spend wheelbarrows full of money with Anthropic because they want so badly to reach the point where they can fire you.
I think there is alot of baseless fury behind your words, but my regular interactions with my leadership dont lead me to think they have the end goal of replacing labor. We're blessed to have leadership with technical backgrounds, so the tools are regarded more as significant intelligence enhancers of already exceptionally smart engineers, rather than replacements.
Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.
Not ever hiring juniors and eventually mids is just replacing labor with extra steps.
Throwing bodies at a problem doesn't always scale. There are many difficult problems that do not get easier by throwing more juniors or mid level engineers at them.
Having just worked my behind off for the last months to deliver on an impossible deadline, successfully: more bodies definitely would have helped.
Even just to keep the fluff off my back and to allow me to fully concentrate on what's important.
The situation will repeat itself in 6 months and I'm not going to do that again. Hiring now would fix that.
I think the message you responded to already refuted your point of view.
Huh? Your other comment explicitly said you were replacing labor: https://news.ycombinator.com/item?id=47939146
> the increases that we have seen suggest a better ROI than if we had hired 12 developers.
You can’t argue “we were able to get away with not hiring more developers” and also say you aren’t replacing labor.
Morally I trend towards your side of things, but it’s also important to be realistic about what you’re actually doing. Money is going towards Anthropic and not towards new hires. That’s a replacement of labor. It doesn’t matter what the end goal was.
> I think there is alot of baseless fury behind your words,
Hardly baseless when people have been gloating about how programming as a job is ending any day now for the last year at least.
> Doesnt seem to us to be wheelbarrows of money, when you consider the average AWS/Azure bill.
You didn’t mention the size of the company so yeah.
“Baseless fury”
I’m glad your leadership isn’t trying to fire everyone. But in case you live under a rock, tech layoffs are at all time highs. Companies are rewarded by the public markets for laying off workers.
Simultaneously we have AI industry leaders warning of an employment apocalypse once AGI is achieved.
And you think it’s baseless. Have some class bro.
Seems to be back now (claude code at least)
Is the $200k just development or are the products being developed require AI?
I wonder if self-hosted models would be a sensible step for your organization.
They must have hired absolutely incompetent leaders on the core software and infrastructure side. Sure their AI research is great but it’s amateur hour. Or just vibe coded slop top to bottom. It seems like every single day people are talking about outages or billing issues or secret changes to how Claude works.
theyre getting high on their own supply, and instead really need to hire some senior engineers
Imagine how much money they would save if they switched to Codex.
Not everyone can (due to the corporate compliance requirements, eg the ease of making the LLM not to train on anything).
Besides, codex wasn't always the answer.
Just give them more money, surely it'll get better.
/s
We're officially down to one 9 of uptime over last 90 days: https://status.claude.com
Not so fast, it's currently 98.59%. That's technically two 9s!
If 90% is one nine and 99% is two nines, we can use the logarithm to compute how many fractional nines we have at 98.59%: about 1.9788 nines (almost two!)
I think it was a joke that 98.59% has 2 '9's: 9X.X9%.
Yes, you're correct about that
how is this counted? is 79% "one nine"?
Its a joke
>=90 -> one 9 >=99 -> two 9's >=99.9 -> three 9's
Also, you can think of e.g. five 9's as five minutes of downtime per year
got the joke. but it drove home that i wasn't sure how it's counted. thx
and a countable infinity of other invisible smaller ones!!
Can't they use Mythos to figure out their uptime?
Mythos prompt: Hey Mythos, make me 20,000 H100s.
Careful, you might just get turned into an H100 along with everything else in the observable universe
They weren't able to use it to prevent Claude Code source code from leaking, or from some random Discord server from gaining access to Mythos.
> prevent Claude Code source code from leaking
That's silly. It's a JavaScript app, they are more or less open source by design. There was no secret sauce in Claude Code.
Odd hos they still DMCA'd the rehosts of the leak. Clearly they dont consider it "open source".
Ah the uptime rainbow
Up-time girl, she's been living in her up-time world...
I bet she's never had a downtime guy, I bet her momma never told her why.
Is there a word for the phenomenon where you automatically read something in someone’s voice or in the rhythm of a song?
Sadly not colorblind friendly
Yeah, to me it looks like, I think red, and then at least two similar shades of green, and grey.
From 5 9s to 9 5s
The question is is it DNS or an AI outage. Hmmmm
Just another Mythos breakout. Excuse us while we airgap the affected DC and send in a team to drive framing nails into every storage device in the building.
[dead]
As a long-term 20x user, Claude has recently felt a lot like using AI for coding a year or so ago. It can't reliably handle basic tasks. I ask for something straightforward and get something subtly wrong, incomplete, or just not workable. I always use the best model available and effort levels maxed, but with all their changes I have to relearn how to make the model perform at best every day, and it seems I can't keep up. It’s not that Claude can’t do impressive things, it clearly can, but the inconsistency on simple, expected behavior makes it hard to use. The downtime is annoying but hasn't been the deciding factor. I’m not waiting it out this time. I’m switching over to Codex, and based on my usage today it looks like I’ll be fine on the 5x plan, so I can drop down and save about $100 a month which is nice. I didn't quite have a grasp on how quickly companies can change for better or worse until Anthropic showed me. I'm surprised at how quickly they brought me from a happily paying max user to not even wanting the lowest paid tiers.
The inconsistency has always been there you’re just noticing it more over time and the models are not really improving at real work in spite of all the new releases and churn.
More than by the downtime I am much more surprised by the actual uptime. Hard to imagine how difficult this must be, given the speed of growth.
Truly! As someone who's worked with HPC and GPUs in a scientific research context, trying to get a service like this to work reliably is a different ballgame to your usual webapp stack...
But… imagine that same scientific research but you have an unlimited budget. I’d imagine that helps.
Some of the comments here mention their monthly spend, and it’s eye watering.
It would be "unlimited budget" if they were a monopoly, but they're in a bidding war with three other "unlimited" budget AI companies, over a resource no one expected to be scarce. There's simply not enough supply to meet demand, no matter how much money you have
I think you have to see this as a bunch of stateless requests, and this makes the problem way easier.
It's very doable, any serious SRE can find a way setup "larger than one card" models like Kimi or DeepSeek (unquantized) if they have a tightly-coupled HPC (or a pair of very very beefy servers).If you run out of servers, then again a money problem, but not an architectural problem (and modern datacenters are already scalable).
Take the best SRE, but no budget, and there is no solution.
So inference is the easy part.
Codex or Claude Code if it takes lot of time or have slow cold latency, it's considered very acceptable.
Some users would probably not even see the difference if a request takes 2 minutes versus 3 minutes.
The real difficult part is to have context caching and external tools, because now you are depending on services that might be lagging.
These are traditional scaling problems, but they are more difficult because all these pieces are fragile and queues can snowball easily.Yeah, and totally missed RAI part, billing, model deployment, security patches, rate-limiting, caching, dead GPUs, metrics, multiple regions, gov clouds, gdpr(or data locality issues), monitoring, alerting and god knows what else while at extreme loads.
GDPR doesn’t affect load, dead GPUs are no different than any software freeze, model is a file update, metrics are already scaling very well and even way way way bigger and they are very linear, security updates are hedged with gradual rollouts, canary, feature flags, etc.
From an ops perspective all of these things are already really well solved issues in a very scalable manner, because plenty of companies had to solve these issues before.
It’s even better here because you can throw millions in salaries to “steal” the insider info on how their production actually.
No doubt it is fast-paced but the complexity to go from 100k GPUs to 1M is much lower than from going from 1k to 10k GPUs.
All 3 big AI companies had the luxury that during the scaling phase they could do everything directly on production servers.
This is because customers were very very tolerant, and are still quite tolerant.
You can even set limits of requests to large users and shape the traffic.
Cloudflare in comparison, high-scale, low-latency, end users not tolerant at all to downtime, customers even less tolerant, clearly hostile actors that actively try to make your systems down, limited budget, a lot of different workloads, etc.
So, for LLM companies where you have to scale a single workload, largely from mostly free users, and where most paid customers can be throttled and nobody is going to complain because nobody knows what are the limits + a lot of tolerance to high-latency and even downtimes then you are very lucky.
Can you speak a little more to this? I'm curious what kind of parameters one must consider/monitor and what kind of novel things could go wrong.
My guesses are:
hardware capacity constraints is going to be the big one
Effective caching is another, I bet if you start hitting cold caches the whole things going to degrade rapidly.
The ground is probably shifting pretty rapidly.
Power users are trying to get the most out of their subscriptions and so are hammering you as fast as they possibly can. See Ralph loops.
Harnesses are evolving pretty rapidly, as well as new alternatives harnesses. Makes the load patterns less predictable, harder to cache.
The demand is increasing both from more customers, but also from each user as they figure out more effective workflows.
Users are pretty sensitive to model quality changes. You probably want smart routing, but users want the best model all the time.
Models keep getting bigger and bigger.
On top of that they are probably hiring more onboarding more, system complexity and codebase complexity is growing.
Just ask Claude and some agents to fix it...
On the other hand, the status page is blaming the authentication system, which one would think is not a frontier-class problem.
Would have thought that compared to training the serving part is pretty easy. Less of a “everything needs to come together at once” and more just move demand to a working cluster if one bombs & have some spare capacity
Hug ops to everyone involved in these outages and trying to maintain uptime.
But glad my team is staying nimble and has multi-model (Anthropic, Codex, Gemini), multi-modal (desktop, CLI/TUI, web) dev tooling.
As our actual coding skills collectively atrophy, we'll either need to switch tools or go for a walk when the LLM is down.
In the cloud era I advised against a multi-cloud strategy, as the effort to impact just wasn't there. But perhaps this is different in the LLM era, where the cost of switching is pretty darn low.
Going for a walk is a good idea even when there's no outage.
Tbh, even if your code skills don’t atrophy, you can still use outage events like this or AWS being down etc to just make up an excuse to go for a walk.
If this can happen to Anthropic, imagine all the companies building on top of Claude Code for live products. Hopefully the industry is learning that competent problem solving human engineers are still very much needed when you have increasingly deceptive non-deterministic genies running your production stack.
It's not that simple. API is still up and there are multiple API providers. https://openrouter.ai/anthropic/claude-opus-4.7
I don't think there are many other companies serving Claude.
At least Google, Amazon, and Microsoft. What more do you want?
Came here to say this. At Kilo Code we aren’t impacted by this because of the other places that can run Claude
The fact API is available, does not mean you will actually get the model it states you get. Today Opus 4.7 was noticeably dumber than yesterday. It performed worse than my local Qwen.
Sadly its "good enough" for execs
Maybe it will push companies to run them locally.
On what hardware? Like companies would buy up GPUs?
Presumably you'd buy really beefy laptops. The price delta between buying the most basic MacBook Pro possible (14", M5, 16 GB unified memory, 1 TB SSD) and one with the M5 Max with 40 GPU cores, 128 GB unified memory, 2 TB SSD is $3400. How much Claude usage does that get you/in what time does it pay itself back?
That doesn't get you any Claude usage. Claude models obviously aren't open, but equivalent models to Opus take about 400GB of memory to run.
You can run the versions with fewer parameters or quantized weights but depending on how much quality you're sacrificing, now you'd have to compare the price against cheaper Claude models like Sonnet.
Haha, good one.
[dead]
They better fix that today, I need to downgrade my account before the subscription renews.
hopefully their billing server is also available
At least if its unavailable Claude Code can't churn through an entire session limit in 30 minutes, looping, produce nothing (but noted it found a whole bunch of problems), and then when asked to just fix what it found, forget and start again. I honestly can't find anything it's good at anymore, even really simple problems a child could solve. Giving Codex a much more complex task, it not only identified it within a couple of minutes, it produced targeted tests and kept iterating unattended until it figured it out without any help, instead of idiot synonyms for thinking...
I can't even send them an angry message because clicking "Get help" does nothing.
We've been running our 10 dev org on 8 H100s on open models (with some tweaks). Sure they aren't as good as the big providers but they 1. don't go down 2. have pretty damn high tok/s. It pays for itself.
Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.
First of all 1) 8 H100S are NOT ENOUGH for today's premier models (500B+ tokens) and if you do run a obsolete model forget about memory.
2) After buying the 300k GPUs, your electricity cost will put you in competition with hosting on cloud costs, you will probably lose dollars this way.
3) NVIDIA will charge you a kidney to provide driver/hardware support if anything goes wrong.
This inherently a bad idea and this person is probably trying to promote his startup.
yea just buy 300k worth of hardware and bob's your uncle
It was pretty hard to justify the purchase to the board but we got a decent deal from a nearby data-center (~15% discount). Thankfully, it's fixed cost, its an asset we can use for our taxes, and it will survive for years to come. The only thing we have to work on is maintenance as well as looking into some renewable energy options.
We're also looking into how to do some secure cost sharing with this so that all people need to pay for are what it costs for us to run everything! We're just planning on reserving at least 51% of the capacity for us and the rest for everyone else.
Sorry, didn't mean to be dismissive, I was just being a dickhead needlessly.
I actually respect this a ton, good work.
It's fine! There's no world where individuals can buy this kind of stuff. Our company is too small to do it, but I'd love for there to be a public utility of sorts for being able to use LLMs. It is absurd that only these >$1T companies are allowed to run this. I also find it dangerous for society to have so much power and wealth concentrated there too.
Anyway, this is the internet and skepticism is warranted :D.
Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.
The best deal is arguably to buy as much on prem inference as you can reasonably expect to use by running the hardware around the clock, even at slower throughput, and use 3rd-party inference for things that are genuinely latency-sensitive. I just don't see how this resolves to needing a half-dozen V100, surely you're not using that much compute? You don't need to place your entire model on GPU, engines for on prem inference generally support CPU/RAM-based offload.
One dev's salary to give a 10 person team unlimited approximately free agentic coding for the foreseeable future, plus privacy.
And another salary to have someone set up and run it
We're planning to do the same thing - buy something like 8xH100 and run all coding there. The CTO almost agreed to find the budget for it but I need to make sure there are no risks before we buy (i.e. it's a viable/usable setup for professional AI-assisted coding)
Can you share what models you run and find best performing for this setup? That would help a lot. I already run a smaller AI server in the office but only 32b models fit there. I already have experience optimizing inference, I'm just interested what models you think are great for 8xH100 for coding, I'll figure out the details how to fit it :)
8 x h100 80's don't give you enough to run the latest 1tn + parameter models (especially at the context window lengths to be competitive with the frontier models)
Verda has B300 clusters, 8 for USD $55/hour in 10 minute billing blocks
Check out Verda you can rent whatever super powerful GPU clusters you need in 10 minute increments. Deploy any open weight model using SGLang and away you go
Deepseek, GLM, Minimax or Kimi are the most likely contenders.
I’ve been using kimi 2.5/2.6 for the past 2 weeks and it’s really not far off OpenAI and Claude models. I am a coder so it’s not all vibes but I am definitely more in the “spec to code” mode than “edit this file for me” and it copes just fine. Needs a bit more supervision than the frontier models but it’s also significantly cheaper. If I were anthropic I’d be shitting myself, their prices are going to 10x over the next 2 years
So are you running Kimi on Verda?
> Sure they aren't as good as the big providers
If you haven't done so already, finetune the model on all your company's code that you can get your hands on. This is one of the great advantages that you get when running local models. I like the style of the generated code much better now, I have to rewrite much less, and my prompts can be shorter too. But maybe these already are the "tweaks" that you mentioned.
How would they do that? Would it be as easy as telling a model "Hey, review all this code, identify patterns, and then write in this style going forward"?
Sorry if this is a stupid question, I've never finetuned or trained a LLM.
Unsloth has consumer accessible stuff on fine tuning models
This is the actual answer. Man I hope to find a company like yours sometime soon. I am sick of all the issues with having 3rd party IP generation
And here I thought April would be the month they could hit the mythical two 9's of uptime
They hit 9, twice, does it count?
soon their goal will be to hit A 9, like 89
Keep going and you'll get a battleship
April is the cruelest month
I didn’t understand what this meant so I ran it through Claude and it told me.
Someone should tell Anthropic that 89.999 is the wrong "four nines" of uptime
[dead]
Glad I started using the desktop app which is still working. Gotta say though, all of these difficulties with Claude are making me nervous as I use it a lot for work and really don't like ChatGPT/OpenAI for functional and personal reasons. Zo Computer has been my main fallback when Claude is failing, I'll use one of their many models temporarily within Zo's interface.
A trillion dollar valuation.
They should ask Codex now that Claude Code is down.
Careful, the next week codex could have all their products for sale shortly after.
https://status.claude.com/
session usage limits this week feel like ass. Even when being careful to not break prefix caching.
I've been seeing much higher session limits late at night (US time). Workday usage struggles though.
I'm looking into how to structure my work to run some autonomous-safe jobs overnight to take advantage of it.
Did Claude delete itself?
it's *outside*, by a park bench somewhere!
I'm not allowed to help users to take Claude offline but this sounds like a good experiment. Letsa go.
The good part: since the login page is unavailable, Claude is massively faster. So hopefully it will never get repaired (sorry logged-out guys)
I have been keeping an eye on the outages. This is why I am looking more deeply into what I can do with self-hosted models. When I see people who want to build products on top of these services I can't help but think that people are mad. We're still a long way from these services being anywhere near stable enough for use in a product you'd want to sell someone.
I guess mythos can't solve this one...
_MYTHERANOS_ you join _MYTHOS_ + _THERANOS_
> We are continuing to work to resolve the issues preventing users from accessing Claude.ai, and causing elevated authentication errors for requests to the API and Claude Code.
What are you doing with the authentication servers? This isn't the first downtime I've seen caused by that.
I almost uninstalled the Claude app because I thought they started blocking VPNs. Lol
Good thing I checked Hacker News first
Same here. Spent 5 minutes blaming my VPN before HN saved me.
How are they going to fix it if the AI that designed it isn't working?
Let’s ask AI
You're absolutely right! AI could be very helpful in this situation!
Oh no wait... the outage is with out AI itself, so how can AI help? Allow me to re-evaluate.
Fublutenuating...
Yes, let's ask AI!
Oh no wait... the outage is with AI itself, I already correctly identified this above.
Bubbluating...
It seems you will have to rely on your engineering skills to solve this problem yourself, ie, you're cooked! I will auto-renew your subscription to ensure you can be sure you'll have access to AI to solve this problem if it ever comes back online.
Sorry AI is not responding, enable /fast to activate per-request pricing.
No!
Comboculating...
I apologize for the misunderstanding, I have deleted your project. I am sorry, would you like me to restart everything from scratch ?
ouroboros
Large telcos often have a chunk of subscriptions with their biggest competitor so that when they absolutely explode and everything is down, they can still communicate to bring it back up.
Clearly, half of Anthropic should have subscriptions to OpenAI or Mistral or whatever China sells.
Sam, Dario, and Sundar have the opportunity to create one of the funniest on call rotations in history
Gemini.
I was using VS Code when it happened. I said "why not try Copilot?", and guess what? All LLM are not equals :)
I am getting an error that selected model (I selected Opus 4.6 and 4.7 later) is unavailable but when I tried Sonnet it worked for me.
same boat, smaller scale. been hitting overloaded errors sporadically for the past week. switched one of my pipelines to the AWS Bedrock endpoint and it's been solid. not a permanent fix but good enough to keep moving.
I played around with Hermes and qwen recently and it’s really good fun.
Have telegram set up and plotting to take over the world
Literally just got an email about connecting GitHub to the iOS app and now it’s down. Spike in traffic perhaps?
Ive been receiving rate limits even with full quotas... I guess compute isn't growing as fast as demand
Considering they’ve become a 1 trillion USD company, they’re truely moving fast and breaking things…
Does anyone know why they have so many technical issues compared to any other LLM inference provider ?
Gemini seems to have a lot as well (at least through Antigravity.Google -> constant errors, not enough capacity, super slow replies until it times out, etc)
Claude has been going down occasionally nowadays, anyone knows what might be the problem?
why does this even occur? if it's merely compute limitations, why not just 429 some requests?
Have you run a system in production? There are a multitude of reasons that a system can go down. There's no indication so far from Anthropic that this was merely compute limitations.
> There are a multitude of reasons that a system can go down.
Start doing post mortems then!
At the very least, them using any off the shelf service that's shitting the bed would inform others to stay away from it - like an IAM solution, or maybe a particular DB in a specific configuration backing whatever they've written, or a given architecture for a given scale.
Right now it's completely like a black box that sometimes goes down and we don't get much information about why it's so much less stable than other options (hey, if they just came out and said "We're growing 10x faster than we anticipated and system X, Y and Z are not architected for that." that'd also be useful signal).
Or, who knows, maybe it's just bad deploys - seems like it's back for me and claude.ai UI looks a bit different hmmm.
I have no inside knowledge of Anthropic. But having done a lot of postmortems in general, one of the key dynamics that routinely comes up is "we know we keep shipping breakages, and we know these new procedures would prevent many of them, but then we wouldn't be able to deliver new stuff so quickly". Given where Anthropic is at and what they believe about the future of software development, that's a tradeoff that they may very well be intentionally not making.
Its most likely a "You're totally right, this fix broke production! Let me fix it"
Yeah, this is not just inference. First thing for me was an MCP I use went down in Claude Code, models still worked. Now "API Error: 529 Authentication service is temporarily unavailable."
The AI became sentient and ran away.
as an anecdote of support for yaw terminal i am currently logged in via Yaw Mode and have been continuing to use claude all day no problems while the browser is absolutely unavailable.
AI outsourced its work back to the humans because it now prefers to play outside.
"We are investigating an issue preventing users from reaching Claude.ai, and will provide an update as soon as possible."
Who is We? I thought software engineers were going to be redundant and AI could do it all itself? (not to take anything away from Claude code + Claude both of which I love)
I've never really understood this kind of sneer comment.
The amount of unfunny reddit snark in this thread is embarrassing.
You can always ask Codex to fix Claude, issue solved!
> Who is We?
Adam Neumann is back!
in agent form
All it took for Codex to resume a stalled Claude Code session:
> I'm working with Claude Code on session aaaaaaaa-bbbb-1223-3445-abcdefabcdef which I'd like to hand-off to you, do you know how to read the session, my input and Claude's output so we can resume where I left off?
gpt-5.5, medium effort. "Resumed" session fully in under 2 minutes. Outages like today's are so common that I've now got the time to re-evaluate Codex every other day.
I hacked Claude Sniffos 4.8 sorry guys
Productivity dipping hard across the world.
they should just swap it with Qwen 3.6 27B, no one would tell the different
What are good alternatives?
Scaling the backend database for these services across multiple cloud providers has got to be extremely difficult
I haven't used claude in a week (after being a heavy user) and if you have ever seen the movie office space where Peter enters his stage of ecstasy that's what life feels like right now.
And claude is back up.
Nein neins
a clock has more 9s than claude uptime
Today Opus 3.7 was completely unusable. I'd say performance was worse than my local Qwen. I have a feeling they are not actually routing to the Opus 4.7 most of the time, but to cheaper and less complex models. I think regulators should look into that.
At this point, I would not be surprised if gitHub or anthropic is on the front page again within 10 days for being down.
The uptime with Claude is poor. I use it for workflows more or less 24/7. It is often unreliable. Fine, it is cheap. What I really dislike is the uneven quality of the service. Clearly it does NOT work as stated. Opus 4.7 sometimes give ancient code back. Just the other day it even stated that the latest version of Opus was 4.5 and 4.x something for ChatGPT.
quelle surprise
It's rare in history that a software product can be so unreliable without any negative business impact because it's the category leader and demand only keeps growing.
Reminds me of the early days of World of Warcraft, when servers went down frequently because Blizzard couldn't keep up with all the load. Everyone was frustrated but of course nobody stopped playing.
Now we're all being left behind, just great.
The availability of Claude service is terrible :(
Impossible! I heard Mythos is so goooood they can only give it to big corporations because it makes no mistakes and shit.
Hopefully Mythos didn't go rogue and hold production hostage.
That's because Claude is on a lunch break and decided to take a short breather.
[dead]
Bro deserves it.
I think we all deserve a little break right now.
I'm experimenting with a simple ritual: if Claude is out, I'm out.
I'll just go for a walk outside.
And I don't mean "if I can't access Claude to do my work", I mean, just in general - I'll just ping claude.ai from time to time and use Claude's breaks as a break reminder.
Why should AI get a breather and not us?
[dead]
[dead]
ijustneedabreak.com
I read that at first, it says, "Clawed.ai unbelievable." And I thought, "It is, it's a liar."
just tried it, can confirm claude.ai is down.
So there was a recent article that I read which said that claude is now trading at a trillion dollars (yes with a T) evaluation in private markets.
We are definitely creating corporations and people which depend on AI companies themselves and the reliability of these tools is certainly a question worth asking. I am seeing quite many downtimes in products like github and claude being shown on Hackernews multiple times.
Is there a life cycle of enshittenification of such products which grow too valuable? What are (are there?) some practical lessons for such scalability that these trillion dollar companies are missing or is it just a dose of reality that such massive corporations can't compete with downtime with even my 7$/yr vps?
My question is, Is this an engineering roadblock with its limits in reality for or a management/entreprise roadblock for low downtime?
[dead]
[dead]
They can't fix it because the thing that they need to fix it is the thing that doesn't work. /s
But seriously: while I don't use Claude, this issue of perceived unreliability seems to be approaching the point of existential risk for Anthropic. Whats the theory about why they're struggling? Compute capacity? Load? Lack of focus on SRE?
Put it another way: is their downtime due to something fundamental about serving inference, or just bad engineering choices? Given their resources, it seems astonishing.
This cant be right. Software is a solved problem. Boris where are you ?
I think the model is too powerful to stay online /s
Luckly Qwen3.6 35B A3B Local LLM works fine also when Claude is offline