Five of the world’s leading frontier AI labs now submit models for government review before public release. Google DeepMind, Microsoft, and xAI announced agreements with CAISI, the Commerce Department’s Center for AI Standards and Innovation, on May 5, according to reporting corroborated across multiple independent sources. They join OpenAI and Anthropic, both of which had prior CAISI evaluation agreements in place. Reporting from The Hill confirmed that CAISI has already completed more than 40 model evaluations to date.
CAISI is a federal agency within the National Institute of Standards and Technology. Its evaluations are government pre-deployment reviews, assessments conducted for national security and safety purposes before a model reaches the public market. This isn’t independent third-party evaluation in the academic or benchmark sense. It’s the federal government’s mechanism for understanding what frontier models can do before those capabilities become publicly available. The distinction matters: these agreements give the US government early access and structured assessment, not certification or approval authority.
The agreements reportedly cover joint safety assessments and research into cybersecurity risk mitigation, according to coverage of the CAISI announcement. The specific scope of each lab’s agreement hasn’t been disclosed in detail. What has been confirmed across multiple sources is the core structure: pre-release access for government evaluators, with the stated goal of national security risk assessment.
Why this matters: the expansion from two labs to five represents a shift in the character of these agreements. When only Anthropic and OpenAI participated, the program looked like an arrangement between the government and the two labs with the deepest federal relationships. Five labs, covering the dominant models in enterprise AI, consumer AI, and the emerging open-weights competitive tier, is a different picture. It suggests the program is becoming a condition of operating at the frontier in the US market, whether through formal mandate or through the reputational and contractual dynamics of federal procurement.
The White House executive order restoring Anthropic’s federal access earlier this spring and the CAISI agent standards initiative launched earlier this year are part of the same architecture: a federal government that is actively building evaluation infrastructure for AI systems it considers strategically significant. The CAISI expansion adds commercial pre-deployment review to that infrastructure.
What to watch: whether these agreements formalize into a published framework or remain informal arrangements. The White House has reportedly been drafting mandatory vetting legislation, a separate development that, if enacted, would convert what are currently voluntary agreements into statutory requirements. If that legislation advances, the five labs that entered voluntary agreements will have a procedural head start. Labs that didn’t will face a compliance gap they can’t close overnight.
The question compliance teams at frontier labs should be asking isn’t whether CAISI evaluation will become mandatory. It’s whether their model documentation, safety assessment processes, and cybersecurity review protocols are ready for the kind of structured scrutiny that government evaluation requires, because the infrastructure for that scrutiny is now in place for five of the most consequential AI development programs in the world.