US AI Model Testing Expands to Google, Microsoft and xAI

Key Takeaway

Microsoft, Google DeepMind and xAI have agreed to give the U.S. government early access to new AI models for AI model testing focused on national security risks before deployment.

US Government AI Model Testing – Key Points

The Story

The U.S. Department of Commerce’s Center for AI Standards and Innovation said the agreements will let government experts evaluate advanced AI models before release and continue targeted research after deployment. The move expands U.S. AI model testing as officials grow more concerned that frontier AI systems could strengthen cyberattacks or be misused in military contexts. The arrangement builds on earlier agreements with OpenAI and Anthropic that began in 2024.

The Facts

Microsoft, Google DeepMind and xAI will share new AI models with the U.S. government before deployment.
The models will be reviewed by the Center for AI Standards and Innovation, known as CAISI, for national security risks.
CAISI will conduct pre-deployment evaluations and targeted research.
The goal is to better assess frontier AI capabilities and advance AI security. CAISI also said it will conduct assessments and research after models are fully deployed.
The agreement fulfills a Trump administration pledge from July 2025.
The administration had said it would partner with technology companies to vet AI models for “national security risks.”
Microsoft will work with U.S. government scientists on deeper AI model testing.
The company said the tests will probe “unexpected behaviors” in AI systems and involve shared datasets and testing workflows.
Microsoft also signed a similar agreement with the UK AI Security Institute.
This suggests the company is extending government-facing safety testing beyond the United States.
Washington is increasingly focused on the security risks of powerful AI systems.
Officials are looking for threats including cyberattacks and potential military misuse before frontier models become widely available.
Anthropic’s Mythos has increased concern around AI-enabled hacking.
Mythos has drawn attention because of its reported ability to identify weaknesses in cybersecurity defenses. Multiple intelligence agencies have begun testing and using the model to find security vulnerabilities.
CAISI says it has already completed more than 40 evaluations.
These included evaluations of cutting-edge models that were not yet publicly available.
Developers sometimes provide less-restricted model versions for testing.
CAISI said companies frequently hand over versions with safety guardrails stripped back so the center can probe national security risks.
The move builds on earlier agreements with OpenAI and Anthropic.
Those agreements began in 2024 under the Biden administration, when CAISI was known as the U.S. Artificial Intelligence Safety Institute. Both companies have also revised their earlier arrangements to reflect the Trump administration’s AI Action Plan.
The Pentagon is also widening its AI partnerships while tensions with Anthropic continue.
The Defense Department reached agreements with Google, Microsoft, Amazon Web Services, Nvidia, OpenAI, Reflection and SpaceX to use their AI systems across classified computer networks. Anthropic was absent after a dispute and legal fight with the Trump administration over safety guardrails and military AI use.

Background

CAISI is the U.S. government’s main hub for AI model testing, collaborative research and the development of best practices. Under the Biden administration, the institute focused on AI tests, definitions and voluntary safety standards. It has since continued work on frontier model evaluations under its new name, though its long-term legal status has not yet been formally established by Congress.

Timeline / What Changed

In 2024, the U.S. government established model-testing agreements with OpenAI and Anthropic.
In July 2025, the Trump administration’s AI Action Plan directed CAISI to play a central role in a national AI evaluation ecosystem.
On May 5, 2026, Microsoft, Google DeepMind and xAI agreed to provide early model access for these reviews.

What to Watch Next

The next issue is how these evaluations influence deployment decisions, safety guardrails and government access to frontier AI systems. Another point to watch is whether the White House moves from voluntary agreements toward a more formal government review process for advanced AI tools.

Why This Matters

The agreement marks another step toward routine AI model testing before public deployment. For end users, the main implication is that the most powerful AI systems may face more structured security checks before release, especially where cyber and military risks are involved.

This article was drafted with the assistance of generative AI. All facts and details were reviewed and confirmed by an editor prior to publication.