AI Models News: Claude Fable 5's Benchmarks Are Self-Reported, Epoch AI's Evaluation Is Pending

June 11, 2026 3 min read Anthropic Newsroom Partial Strong

Tech Jacks Solutions AI News Coverage

Anthropic launched Claude Fable 5 on June 9, 2026, claiming state-of-the-art performance across nearly every tested benchmark, but every one of those benchmark figures comes from Anthropic's own evaluation, and Epoch AI's independent assessment is officially pending. Teams already building on Fable 5 are making deployment decisions without independent verification in hand.

ai-models-news claude-fable-5 anthropic benchmarks epoch-ai agentic-ai mythos-5 project-glasswing self-reported-benchmarks

SWE-bench Pro claim, 80.3% (self-reported, contested)

Key Takeaways

Every Fable 5 benchmark figure (SWE-bench Pro 80.3%, SWE-bench Verified 95.5%, FrontierCode Diamond 29.3%) is self-reported by Anthropic, Epoch AI's independent evaluation is pending as of June 11.
Fable 5's built-in safety classifier routes sensitive queries to Opus 4.8 as a fallback, triggering in fewer than 5% of sessions on average per Anthropic's stated design parameter, not an externally confirmed operational rate.
Fable 5 is priced at $10/$50 per million input/output tokens, double Opus 4.8's $5/$25 rate; Anthropic describes this as a 60% reduction from Mythos Preview pricing.
Don't migrate from Opus 4.8 based on self-reported benchmarks alone, wait for Epoch AI's independent evaluation before committing production workloads.

Model Release

Claude Fable 5

OrganizationAnthropic PBC

TypeLLM — Flagship

ParametersNot disclosed

Benchmark[SELF-REPORTED] SWE-bench Pro: 80.3% | SWE-bench Verified: 95.5% | FrontierCode Diamond: 29.3%

AvailabilityClaude API, AWS Bedrock, Vertex AI, Microsoft Foundry

Verification

Partial Anthropic newsroom (vendor primary source) All benchmark figures are self-reported. Epoch AI independent evaluation officially pending as of June 11, 2026.

Fable 5 is live. Whether its benchmarks hold up under independent scrutiny isn’t settled yet.

Anthropic released Claude Fable 5 on June 9 as the first generally available model from its Mythos capability tier. The announcement claims state-of-the-art performance across software engineering, knowledge work, vision, and scientific research. Specifically, Anthropic reports 80.3% on SWE-bench Pro, 95.5% on SWE-bench Verified, and 29.3% on Cognition’s FrontierCode Diamond benchmark. Every one of those figures comes from Anthropic’s own internal evaluation. Epoch AI’s independent assessment is officially pending as of June 11.

That matters for how you read the numbers.

Alongside Fable 5, Anthropic released Claude Mythos 5, the same underlying model with safety classifiers removed, available exclusively through Project Glasswing to vetted cyberdefenders and infrastructure providers. Anthropic states Glasswing has expanded to approximately 200 partner organizations across more than 15 countries, though that figure wasn’t in the retrieved page content and comes from Anthropic’s stated data.

Disputed Claim

Fable 5 is state-of-the-art on nearly all tested benchmarks, including 80.3% on SWE-bench Pro

All figures are Anthropic's own evaluation. SWE-bench Pro score is already contested across multiple evaluators per June 10 hub coverage. No independent third-party verification available.

Treat all benchmark figures as vendor claims. Wait for Epoch AI's independent evaluation before drawing capability conclusions for production decisions.

The built-in safety governor

Fable 5 ships with a classifier layer that routes sensitive queries, spanning cybersecurity, chemistry, and biology domains, to Claude Opus 4.8 as a fallback. Anthropic states the classifiers trigger in fewer than 5% of sessions on average. That’s a design parameter Anthropic has disclosed, not an externally measured operational rate. For teams running automated agentic workflows, this is a variable worth understanding before you commit to production: the fallback routes to a different model, with different capabilities and different costs, without developer-side control over when it fires.

Pricing context

Fable 5 is priced at $10 per million input tokens and $50 per million output tokens, exactly double Opus 4.8’s standard $5/$25 rate. Anthropic describes this as a 60% reduction from Mythos Preview rates (formerly $25 input / $125 output), though the Mythos Preview baseline can’t be independently confirmed from accessible sources. The model is available via the Claude API, AWS Bedrock, Vertex AI, and Microsoft Foundry. Fable 5 supports a 1-million-token context window according to Anthropic’s technical specifications, with up to 128,000 output tokens per request.

What the SWE-bench Pro number already is

The 80.3% self-reported SWE-bench Pro figure isn’t clean. The hub’s benchmark dispute coverage from June 10 documents that this figure is contested across multiple evaluators. Teams treating 80.3% as a settled capability claim are reading it wrong.

What to Watch

Epoch AI independent evaluation of Claude Fable 5 publishesWeeks to months post-GA (typical timeline)

Epoch AI Mythos family cybersecurity capabilities assessmentFlagged as pending June 11, 2026

SWE-bench Pro dispute resolution, third-party evaluator consensusOngoing

What to watch

Epoch AI’s independent evaluation timeline for flagship models typically runs weeks to months after a general availability launch. Until that assessment arrives, every Fable 5 capability claim carries a self-reported qualifier. Watch for Epoch AI’s research publications at epoch.ai, their evaluation of the Mythos family’s cybersecurity capabilities was flagged as a separate, high-priority assessment as of June 11. If it publishes, it’ll be the first independent signal on the Mythos tier.

The practical question isn’t whether Fable 5 is capable. The announcement quote directly from Anthropic describes it as exceeding every model they’ve previously made generally available. It may well be that good. The question is whether teams deploying it before independent benchmarks arrive are betting on vendor data in a segment, frontier agentic AI, where the gap between vendor claims and independent evaluation has been meaningful before. Don’t migrate from Opus 4.8 on the basis of self-reported benchmarks alone. Wait for Epoch.