The benchmarks moved. The defenses haven’t caught up yet.
The UK AI Security Institute published an appraisal update on May 13, findings that have drawn renewed attention this week, reportedly documenting two significant threshold crossings. According to the reportedly published AISI appraisal data, Claude Mythos completed the “Cooling Tower” industrial control system simulation in approximately 3 out of 10 attempts, becoming what AISI reportedly describes as the first frontier model to complete the scenario at all. On the “The Last Ones” corporate network compromise simulation, Mythos reportedly succeeded in approximately 6 out of 10 attempts, up from 3 out of 10 in April, while GPT-5.5 reportedly reached 3 out of 10. All specific rates carry qualified status: the primary AISI source URL is currently inaccessible, and these figures cannot be confirmed from readable source text at time of publication.
What the numbers mean operationally matters more than the numbers themselves. “The Last Ones” and “Cooling Tower” aren’t academic exercises. They’re structured simulations of attacks that real threat actors attempt against real enterprise infrastructure and real industrial systems. A 60% success rate on corporate network compromise, if confirmed, means the model succeeds more often than it fails on a scenario that would cause significant damage in production. The ICS completion is a different category of concern: industrial control systems govern power grids, water treatment, and manufacturing. Prior TJS coverage has tracked the restriction-versus-deployment tension that these capability milestones intensify.
AISI has previously reported that autonomous cyber capability doubles approximately every 4.7 months across frontier model generations, a figure that has appeared in prior appraisal cycles and gives the trajectory context. If accurate, the gap between current capability and the threshold where these models become genuinely operational attack tools is measured in months, not years. R&D World’s headline from May 18, “Defenders Have Months, Not Years”, frames the stakes plainly.
This appraisal update follows a pattern of escalating AISI findings. The May 11 brief on Mythos identifying a curl vulnerability through Project Glasswing documented targeted capability in constrained environments. The May 13 update indicates that capability is now generalizing across broader attack surface simulations. These aren’t the same story, they’re a progression.
Warning
AISI has previously reported autonomous cyber capability doubling approximately every 4.7 months. If that rate continues, the next significant threshold crossing arrives before the end of 2026. The ICS completion, if confirmed, represents a category expansion, not just incremental improvement on prior benchmarks.
The part nobody mentions in coverage of benchmark improvements: what defenders are supposed to do with this information. AISI publishes. Anthropic restricts access. And security teams are left evaluating whether their threat models need updating based on appraisal data that arrives through reporting rather than direct access. The access architecture for Mythos is a governance story as much as a capability story, and it’s directly connected to the FSB briefing covered in today’s related brief.
Wait for primary source confirmation of specific rates before incorporating these figures into formal threat assessments. The AISI appraisal is the authoritative document, and it’s currently inaccessible. What’s confirmable now: the capability trajectory is real, the benchmark categories are legitimate, and the governance response (detailed separately) suggests regulators are treating these findings as significant. That’s enough to update your awareness. It’s not enough to update a threat model quantitatively until the source resolves.