Fable 5 is live. Whether its benchmarks hold up under independent scrutiny isn’t settled yet.
Anthropic released Claude Fable 5 on June 9 as the first generally available model from its Mythos capability tier. The announcement claims state-of-the-art performance across software engineering, knowledge work, vision, and scientific research. Specifically, Anthropic reports 80.3% on SWE-bench Pro, 95.5% on SWE-bench Verified, and 29.3% on Cognition’s FrontierCode Diamond benchmark. Every one of those figures comes from Anthropic’s own internal evaluation. Epoch AI’s independent assessment is officially pending as of June 11.
That matters for how you read the numbers.
Alongside Fable 5, Anthropic released Claude Mythos 5, the same underlying model with safety classifiers removed, available exclusively through Project Glasswing to vetted cyberdefenders and infrastructure providers. Anthropic states Glasswing has expanded to approximately 200 partner organizations across more than 15 countries, though that figure wasn’t in the retrieved page content and comes from Anthropic’s stated data.
Disputed Claim
The built-in safety governor
Fable 5 ships with a classifier layer that routes sensitive queries, spanning cybersecurity, chemistry, and biology domains, to Claude Opus 4.8 as a fallback. Anthropic states the classifiers trigger in fewer than 5% of sessions on average. That’s a design parameter Anthropic has disclosed, not an externally measured operational rate. For teams running automated agentic workflows, this is a variable worth understanding before you commit to production: the fallback routes to a different model, with different capabilities and different costs, without developer-side control over when it fires.
Pricing context
Fable 5 is priced at $10 per million input tokens and $50 per million output tokens, exactly double Opus 4.8’s standard $5/$25 rate. Anthropic describes this as a 60% reduction from Mythos Preview rates (formerly $25 input / $125 output), though the Mythos Preview baseline can’t be independently confirmed from accessible sources. The model is available via the Claude API, AWS Bedrock, Vertex AI, and Microsoft Foundry. Fable 5 supports a 1-million-token context window according to Anthropic’s technical specifications, with up to 128,000 output tokens per request.
What the SWE-bench Pro number already is
The 80.3% self-reported SWE-bench Pro figure isn’t clean. The hub’s benchmark dispute coverage from June 10 documents that this figure is contested across multiple evaluators. Teams treating 80.3% as a settled capability claim are reading it wrong.
What to Watch
What to watch
Epoch AI’s independent evaluation timeline for flagship models typically runs weeks to months after a general availability launch. Until that assessment arrives, every Fable 5 capability claim carries a self-reported qualifier. Watch for Epoch AI’s research publications at epoch.ai, their evaluation of the Mythos family’s cybersecurity capabilities was flagged as a separate, high-priority assessment as of June 11. If it publishes, it’ll be the first independent signal on the Mythos tier.
The practical question isn’t whether Fable 5 is capable. The announcement quote directly from Anthropic describes it as exceeding every model they’ve previously made generally available. It may well be that good. The question is whether teams deploying it before independent benchmarks arrive are betting on vendor data in a segment, frontier agentic AI, where the gap between vendor claims and independent evaluation has been meaningful before. Don’t migrate from Opus 4.8 on the basis of self-reported benchmarks alone. Wait for Epoch.