A Postmortem of Three Recent Issues

Source: https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues Author: Sam McAllister Published: Sep 17, 2025 Company: Anthropic

Summary

This is a technical report on three bugs that intermittently degraded responses from Claude. Between August and early September 2025, three infrastructure bugs caused degraded responses from Claude.

Key Points

How Anthropic Serves Claude at Scale

Claude is served via first-party API, Amazon Bedrock, and Google Cloud's Vertex AI
Deployed across multiple hardware platforms: AWS Trainium, NVIDIA GPUs, and Google TPUs
Each platform requires specific optimizations while maintaining strict equivalence standards

Timeline of Events

August 5: First bug introduced (context window routing error)
August 25-26: Two more bugs deployed
August 29: Load balancing change increased affected traffic
September 2-18: Fixes deployed across platforms

The Three Bugs

1. Context Window Routing Error

Some Sonnet 4 requests were misrouted to servers configured for 1M token context window
Initially affected 0.8% of requests, peaked at 16% on August 31
"Sticky" routing meant affected users continued getting degraded responses

2. Output Corruption

Misconfiguration on TPU servers caused token generation errors
Occasionally assigned high probability to wrong tokens
Produced Thai/Chinese characters in English responses, syntax errors in code

3. Approximate Top-k XLA:TPU Miscompilation

Code change triggered latent bug in XLA:TPU compiler
Related to mixed precision arithmetic (bf16 vs fp32)
Bug behavior was frustratingly inconsistent

XLA Compiler Bug Deep Dive

Models calculate probabilities for each possible next word
Use "top-p sampling" with threshold of 0.99-0.999
Precision mismatch between bf16 and fp32 caused issues
Approximate top-k operation sometimes returned completely wrong results
Fixed by switching from approximate to exact top-k

Why Detection Was Difficult

Evaluations didn't capture the degradation users reported
Privacy practices limited engineer access to user interactions
Each bug produced different symptoms on different platforms
Overlapping bugs created confusing, contradictory reports

What They're Changing

More sensitive evaluations
Quality evaluations running continuously on production systems
Faster debugging tooling
Better tools to debug community-sourced feedback while preserving privacy

Key Quote

"To state it plainly: We never reduce model quality due to demand, time of day, or server load. The problems our users reported were due to infrastructure bugs alone."

PreviousA Postmortem of Three Recent Issues NextStoryboard: A Postmortem of Three Recent Issues

Last updated 2 months ago

hashtagSummary

hashtagKey Points

hashtagHow Anthropic Serves Claude at Scale

hashtagTimeline of Events

hashtagThe Three Bugs

hashtagXLA Compiler Bug Deep Dive

hashtagWhy Detection Was Difficult

hashtagWhat They're Changing

hashtagKey Quote