Forty evaluations before the order existed.
That’s the number that reframes what this EO actually is. CAISI Director Chris Fall confirmed, per Forbes reporting, that NIST’s Center for AI Standards and Innovation had completed more than 40 pre-deployment evaluations of frontier AI models, including models that haven’t been publicly released, before President Trump signed the cybersecurity executive order on May 21. The EO didn’t create the program. It gave it a statutory home.
The distinction matters for every compliance team at a frontier model developer trying to understand what the order changes for them. The short answer: the testing infrastructure existed, the lab agreements existed, the evaluation criteria existed. What’s new is that they’re now part of a codified federal instrument, and non-participation has shifted from “not yet engaged” to “visibly absent from a named federal program.”
What the EO Actually Establishes
The executive order does three things. First, it creates a voluntary testing program for frontier AI models focused specifically on cybersecurity risk, not general safety, not bias, not hallucination. The scope is narrow and deliberate: can the model autonomously identify, exploit, or chain vulnerabilities in state, local, and federal network infrastructure?
Second, it revamps Obama- and Biden-era cybersecurity information-sharing programs, programs that governed how federal agencies, critical infrastructure operators, and technology vendors share threat intelligence, to explicitly include AI model developers. That’s not a small update. AI developers were outside that sharing regime entirely. The amendment closes a classification gap that the cybersecurity community had flagged as increasingly untenable as model capabilities advanced.
Third, per Nextgov’s reporting, the order reportedly gives the NSA a defined role in the voluntary testing architecture, a significant inclusion given the NSA’s technical depth on offensive cyber capabilities. The agency’s involvement signals that the evaluation criteria will be informed by classified threat intelligence, not just publicly known attack patterns.
None of this is mandatory. The framework is voluntary. No lab faces a legal consequence for declining to participate. That’s accurate, and it’s also a misleading frame once you understand who’s already inside.
Who’s Already in the Framework
All five named frontier labs have CAISI evaluation agreements: OpenAI, Anthropic, Google DeepMind, xAI, and Microsoft. Prior TJS reporting confirmed these agreements were in place before this EO was signed. The labs didn’t wait for a legislative mandate. They engaged because CAISI offered something the labs’ internal red teams couldn’t easily replicate: independent evaluation with access to classified infrastructure attack patterns.
Frontier Model Developer: Post-EO Compliance Actions
- Confirm or initiate CAISI evaluation agreement
- Assess revised information-sharing obligations under cybersecurity EO amendment
- Distinguish cybersecurity EO compliance posture from 90-day pre-launch review obligations
- Monitor CAISI evaluation publication status, results currently non-public
What to Watch
What does participation actually require? Based on the evaluation structure CAISI has operated under, labs provide:
– Model access for pre-deployment testing, including unreleased versions – Technical documentation sufficient for evaluators to construct targeted attack scenarios – Participation in structured red-team exercises against specified critical infrastructure profiles – Acceptance that evaluation results may be shared within the federal information-sharing architecture
What participation doesn’t require: mandatory publication of results, mandatory remediation before release, or pre-approval for deployment. The program is an evaluation and intelligence-sharing mechanism, not a certification gate.
The Capability That Put This on the Agenda
The EO’s scope, autonomous cyber exploitation of networked infrastructure, maps precisely to the capability class that Anthropic’s Claude Mythos model demonstrated. Anthropic’s own documentation confirms Mythos autonomously discovered read-and-write primitives in target systems and chained them into multi-stage network exploits. Whether Mythos was the direct trigger for EO drafting isn’t confirmed in public records, that causal link remains an inference cited by security researchers, not a finding in any government document. But the policy response fits the demonstrated capability like a template.
TJS previously mapped the restriction-versus-disclosure tension that emerges when a model crosses this capability threshold. The EO takes a specific position on that tension: voluntary disclosure inside a controlled federal evaluation architecture, rather than either mandatory pre-release review or no federal involvement at all. That’s a policy choice with a short shelf life if capabilities continue to advance at the current pace.
The UK AI Safety Institute evaluated Mythos against cybersecurity benchmarks independently, per prior registry-confirmed reporting. That evaluation and CAISI’s domestic program now operate in parallel, the EO doesn’t integrate the two, and the gap between US voluntary federal testing and UK mandatory safety evaluation is now a structural feature of the transatlantic AI governance landscape, not an oversight.
How This EO Differs From the 90-Day Pre-Launch Review Order
These are two distinct instruments. The 90-Day Pre-Launch Review Order, covered in TJS’s prior brief on the White House pre-launch review architecture, establishes a pre-release review window for certain model deployments. The cybersecurity EO is specifically about ongoing evaluation of cyber-exploitation capabilities, with a scope tied to critical infrastructure risk rather than general model deployment.
Conflating the two is a planning error with real consequences. A compliance team that treats the cybersecurity EO as redundant with the pre-launch review window misses the fact that the CAISI program evaluates released and unreleased models on an ongoing basis, it’s not a one-time gate clearance. And the pre-launch review window doesn’t include the cybersecurity information-sharing revision. The programs have different administrators, different evaluation criteria, and different participation structures.
Analysis
Voluntary compliance frameworks in federal AI governance are following the same pattern as voluntary cybersecurity frameworks in the 2010s: they start with the largest players, establish norms through practice, and eventually create reputational and procurement costs for non-participants that function as de facto mandates. This EO is the NIST Cybersecurity Framework moment for AI, voluntary today, industry-standard in 18 months.
Who This Affects
What Compliance Teams Should Do Now
Three actions, in order.
Confirm your CAISI agreement status. If your organization develops frontier AI models and doesn’t have a formal CAISI evaluation agreement, you’re now operating as a visible exception in a named federal program. That changes the procurement and regulatory risk calculation. The engagement path is through NIST’s CAISI program office directly.
Map the information-sharing revision against your current disclosure posture. The EO’s amendment to existing cybersecurity information-sharing programs means your legal team needs to assess whether your AI model development activities now fall within the revised sharing obligations. The scope is narrow, it’s about cybersecurity threat intelligence, not general safety data, but the inclusion is new and the legal implications need to be assessed against your existing posture.
Watch the CAISI evaluation publication question. Forty-plus evaluations have been completed. None of the results are public. That asymmetry won’t hold indefinitely. Whether CAISI publishes aggregate findings, even in anonymized form, is the policy trigger that would shift this from a background compliance consideration to an active reputational and competitive issue for labs whose models underperformed in evaluation.
TJS Synthesis
The real question isn’t whether this EO is voluntary. It’s what voluntary compliance looks like when all five frontier labs are already inside the program and the remaining developers aren’t. The architecture the EO codifies was built through agreements, not mandates, and that process happened faster than the legislative process that might have produced a mandate. What follows from this signing is probably not broader mandatory testing. It’s broader voluntary enrollment, because the cost of being named as the lab that declined federal cybersecurity evaluation has just become harder to absorb. Expect CAISI’s agreement count to grow in the next 90 days.