Count the headline counts, three more frontier labs now have formal pre-deployment evaluation agreements with CAISI (formerly the US AI Safety Institute). But the count isn’t what matters here.
What matters is why. According to reporting by CNBC, the expansion of formal agreements to Google DeepMind, Microsoft, and xAI follows CAISI’s experience evaluating Anthropic’s Mythos model, a restricted system not publicly released. The UK AISI, according to reporting from Just Security citing an independent UK AISI assessment, characterized Mythos as representing a step up in zero-day vulnerability identification capability. That assessment, the reporting suggests, drove the decision to formalize evaluation scope across other frontier labs.
That’s a governance precedent, not just a program update.
The precedent is this: a specific capability finding, not a policy process, not a legislative timeline, demonstrably expanded the scope of what federal oversight agreements require of private companies. If that causal chain holds, capability evaluations aren’t just safety checks. They’re the mechanism by which the government defines what vetting looks like for everyone else.
CAISI Formal Agreement Postures, May 2026
The evaluation scope for the new agreements covers cybersecurity, biosecurity, and chemical weapons risks, the same categories that grounded the original CAISI engagement with Anthropic. Extending that scope to three additional labs isn’t simply adding to a list. It’s the government asserting that the threat profile identified in one model applies broadly enough to govern others.
Prior TJS coverage on Mythos access and control architecture provides context on why this model’s capability profile attracted federal attention in the first place.
One additional claim in the reporting deserves separate treatment. Just Security, in a single report, describes federal agencies reportedly circumventing the Trump administration’s ban on Anthropic tools to conduct defensive safety tests. That’s a single-source claim from a credible national security law publication, but it’s one source. It’s included here because it’s potentially significant, and readers deserve to know both that it exists and that it carries that evidentiary weight. Don’t treat it as confirmed.
The real question is what “formal agreements” actually require, and what happens if a lab declines. These agreements don’t yet have a publicly confirmed legal enforcement mechanism. The distinction between a formal agreement and a mandatory legal requirement isn’t semantic. Until a statutory mandate or executive order establishes the enforcement authority, these remain binding by cooperation, not compulsion.
What to Watch
Reporting on the White House’s reportedly drafted mandatory pre-release review order is the forward signal to watch. If that executive order materializes, the current agreements become the template for what mandatory vetting looks like, not a voluntary predecessor to be replaced.
Here’s what matters for your planning.
Bottom Line
The catch is that the legal architecture around “formal” federal evaluation agreements for private AI systems is genuinely unsettled. That won’t stop the agreements from having real operational effects on labs that signed them.