Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Skip to content
Technology Daily Brief

UK AISI Evaluation: Claude Mythos Completes "Cooling Tower" Cybersecurity Benchmark in 3 of 10 Attempts

3 min read UK AI Safety Institute (AISI) Qualified
The UK AI Safety Institute's independent evaluation of Claude Mythos reportedly found the model completing a previously unsolved cybersecurity benchmark, dubbed "cooling tower" - in 3 out of 10 attempts, according to Guardian reporting. The result signals a step change in autonomous cyber capability, and Anthropic has simultaneously scheduled a briefing with the Financial Stability Board.
Cooling tower completion rate, 30% (single-source)

Key Takeaways

  • The UK AISI independently evaluated Claude Mythos; The Guardian reports a 30% completion rate on the "cooling tower" benchmark, a previously unsolved cybersecurity task. Single-source; treat as qualified until AISI publishes formal report.
  • Anthropic has restricted Mythos to institutional partners (reportedly including JP Morgan) for defensive hardening only.
  • Anthropic is scheduled to brief the Financial Stability Board on Mythos cyber capability findings, a signal that AI-native systemic financial risk is now a regulatory agenda item.
  • The AISI evaluation represents independent governance infrastructure for restricted-access AI, distinct from vendor self-assessment.

Verification

Qualified The Guardian (T3 journalism), both URLs inaccessible at time of publication. Consistent with established registry coverage of Mythos. 30% cooling tower benchmark figure is single-source from The Guardian. The Guardian's URLs are currently broken. Treat as qualified reporting until AISI publishes a formal evaluation document.

Three out of ten. That’s the number The Guardian’s reporting cites for Claude Mythos completing the “cooling tower” benchmark under independent UK AI Safety Institute evaluation, a task previously considered beyond autonomous AI capability. If the figure holds under further scrutiny, it’s the kind of result that changes how regulators, security firms, and enterprise procurement teams think about what restricted-access AI models can actually do.

The primary sources for this specific benchmark result are The Guardian’s May 18 reporting on the AISI evaluation and its April 10 reporting on Anthropic’s restricted access decision, both URLs are currently inaccessible. This briefing treats the 30% cooling tower figure as single-source, qualified reporting. It’s consistent with everything else in the Mythos coverage record, but the specific benchmark number requires corroboration before it should be treated as settled fact.

What is well-established across multiple prior TJS briefs and the UK AI Safety Institute accessible article: Claude Mythos is an Anthropic model with cybersecurity capabilities significant enough to keep it out of public release. Anthropic has restricted access to named institutional partners, reportedly including JP Morgan, to support defensive hardening work. The model is reportedly capable of autonomously identifying and simulating exploitation of software vulnerabilities, that characterization is consistent with the AISI evaluation framing, even if the cooling tower benchmark specifics need additional sourcing.

Claude Mythos Access and Governance Stakeholders

Anthropic
neutral
Restricting public access; proactively engaging FSB on capability findings
UK AI Safety Institute
neutral
Conducting independent capability evaluations; evaluation findings reported via journalism, not yet published formally
Financial Stability Board
neutral
Receiving briefing from Anthropic on Mythos cyber capability and financial system implications
JP Morgan (reported)
for
Reportedly among named institutional partners with access for defensive hardening

The FSB briefing is the forward-looking piece of this story. A scheduled Anthropic presentation to the Financial Stability Board, the international body that coordinates global financial system regulation, places Mythos directly in the center of systemic financial risk governance. The Guardian’s May 18 piece framed this as Anthropic proactively sharing findings rather than being compelled to appear. That’s a meaningful distinction for how AI labs and financial regulators are learning to coexist.

Why this matters. The cooling tower benchmark claim, whatever the exact percentage – represents a capability threshold, not just an incremental improvement. Benchmarks that were previously “unsolvable” by AI systems mark genuine discontinuities in the threat landscape. Security teams and compliance officers at financial institutions need to understand that the question isn’t whether advanced AI models can perform sophisticated cyber operations. It’s who controls access and under what governance framework.

Context. The AISI evaluation fits a pattern this hub has tracked since early May: restricted models operating at or beyond elite human capability in offensive security are now a governance challenge, not a theoretical risk. The Mythos access architecture – named partners, defensive use only, is the current industry response. It’s not a permanent solution.

What to Watch

AISI formal evaluation report publicationUnknown, monitor AISI publications
FSB guidance or statement following Anthropic briefingQ2-Q3 2026
Additional named institutional partners gaining Mythos accessOngoing

What to watch. The FSB briefing outcome is the near-term signal. If the FSB issues guidance linking advanced AI cyber capability to systemic financial risk, it creates regulatory pressure on banks to develop AI-specific cyber resilience frameworks, and accelerates demand for sovereign, regulated alternatives like Mistral’s reported banking-focused model. Watch also for AISI publishing a formal evaluation report; the Guardian’s reporting is sourced from the evaluation, but a published document would upgrade this from single-source to verifiable record.

TJS synthesis. The cooling tower figure needs corroboration, don’t quote the 30% in a board presentation until the AISI report is published. The broader story is already solid: Mythos operates at a level that requires institutional governance, not just product terms of service. Financial institutions that haven’t yet mapped their AI security risk exposure against restricted-model capability thresholds should treat the FSB briefing as a starting gun.

View Source
More Technology intelligence
View all Technology

More from May 19, 2026

Stay ahead on Technology

Get verified AI intelligence delivered daily. No hype, no speculation, just what matters.

Explore the AI News Hub