US gives federal AI evaluator early access to Google, Microsoft, xAI models

The Center for AI Standards and Innovation has agreements with Google DeepMind, Microsoft, and xAI to evaluate models pre-release. Trump-administration oversight expands. Anthropic's status is unclear.

sources[CNBC][Claims Journal]

The Center for AI Standards and Innovation announced agreements with Google DeepMind, Microsoft, and xAI on May 5 [CNBC]. The agreements give the U.S. government early access to evaluate AI models before public release.

── What shipped ──

The Center is the federal entity established under the Trump administration's AI executive orders to evaluate frontier AI systems. The new agreements:

Cover pre-release model access for federal evaluators
Include Google DeepMind, Microsoft, and xAI as initial signatories [Claims Journal]
Are voluntary on the part of model developers (not subject to mandatory disclosure)
Do not, as of this disclosure, include Anthropic or OpenAI explicitly

── Why it matters ──

Three structural points.

One — the US is shifting from regulation to evaluation. The previous administration's AI Executive Order leaned on disclosure mandates and reporting requirements. The current approach favours voluntary pre-release access agreements. Both reduce model-vendor friction; the second is gentler on industry but produces less consistent oversight across vendors.

Two — competitive optics. Anthropic's absence from the announced cohort is notable. Anthropic has historically been the most cooperative on safety evaluation (RSP commitments, voluntary Frontier Model Forum participation). The omission likely reflects an existing parallel agreement rather than exclusion, but the optics matter — vendors not on the list look uncooperative regardless of underlying reality.

Three — implications for shipping models. Pre-release federal evaluation adds calendar time to model releases for participating vendors. Expected delays: 2–6 weeks per major release for evaluation cycles, depending on findings.

── Editor's take ──

The voluntary pre-release evaluation regime is the lightest possible version of meaningful oversight. It works only as long as vendors cooperate; it provides limited backstop if a vendor unilaterally pulls out. The credibility test for the Center will be the first time it asks a vendor to delay a release based on safety findings — and what happens when that vendor refuses.

adjacent broadcasts

TX_009EU agrees to simplify the AI Act. The transparency clock now runs faster.

TX_050China-linked group targeting NATO state, journalists, semiconductor sector

TX_039Senate clears CLARITY Act compromise. Stablecoins barred from paying yield.

share x →submit hn →

operator_channel

[ comments_offline · provider_not_configured ]

transmission_log

// newsletter_offline · provider_not_configured