The evaluation arrived before the suspension did.
Per reports citing Epoch AI’s June 12 independent evaluation, Claude Fable 5 scored 88% on FrontierMath Tier 4 (v2) and 87% on Tiers 1–3. Fable 5 reportedly outperformed GPT-5.5 by approximately 13 percentage points on the same benchmark, with GPT-5.5 scoring roughly 75%, according to reports citing that evaluation. Three days later, according to multiple published reports, the Commerce Department issued an export control directive requiring Anthropic to suspend access to both models for foreign nationals. Anthropic suspended global access to both models, per those reports.
The catch is: Epoch AI published the only independent technical record of these models the same week they disappeared from general availability.
What the specs confirm, per Anthropic’s official API documentation, is more grounded. Claude Fable 5 and Claude Mythos 5 share the same underlying specifications: a 1 million token default context window, up to 128,000 output tokens per request, and pricing of $10 per million input tokens and $50 per million output tokens. Both are accessible via the API under the model identifier `claude-fable-5`. These figures are T1-confirmed. The suspension status and the benchmark data are not, both rest on T3 reporting, qualified throughout.
FrontierMath Tier 4 (v2), per reports citing Epoch AI evaluation
Evidence
Why it matters
Most AI benchmark cycles work like this: a model releases, a third-party evaluator publishes independent results, developers run their own tests, adoption decisions follow. That sequence broke here. The evaluation shipped. The adoption phase didn’t. For teams that were using Fable 5 or Mythos 5 in production, the suspension is operational; for teams evaluating whether to adopt, the Epoch data has become their only external reference point for a model they can’t currently test.
Reports indicate the directive was issued in part over concerns about a potential jailbreak vulnerability that could enable the models to be used without their built-in safety restrictions. That framing comes from media coverage, not from confirmed government directive language, and it should be read accordingly.
Context
The Fable 5 suspension isn’t the first AI governance intervention, but it’s the first reported case of a U.S. government directive removing a commercially deployed frontier model from broad public availability. The prior TJS analysis of the directive covered the legal and national security framing; this brief sits on the technology side of that story. The pattern matters independently of this specific model: a government can issue a directive that removes a production AI system from availability with no technical warning.
What to Watch
What to watch
Anthropic has reportedly cited 10 USC 3252 in contesting the directive. The legal challenge is covered in the regulation pillar. The technology question is simpler: if the suspension lifts, the Epoch evaluation data will have aged by weeks or months, and developer teams will face the same adoption decision with a benchmark record they couldn’t act on in real time. If the suspension holds or extends, the FrontierMath data becomes a historical artifact, useful for competitive analysis, not for deployment planning.
TJS synthesis
Don’t rely on the Epoch benchmark alone to anchor a migration decision. Independent evaluation data is more valuable than vendor claims, but it’s not a substitute for your team’s own testing on your actual workloads. If you were planning to evaluate Fable 5 or Mythos 5, wait for the Epoch AI primary evaluation page to be confirmed and sourced, the figures cited here come from T3 journalism, and the gap between “roughly 75%” for GPT-5.5 and a precise score matters if that comparison is driving your architecture decision. The suspension itself is reason enough to hold evaluation timelines until availability is confirmed.