AI Safety News: Anthropic Transfers Petri Alignment Tool to Meridian Labs, What Independent Stewardship Signals

May 8, 2026 3 min read Anthropic Alignment Science / Meridian Labs / Financial Times Partial Moderate

Tech Jacks Solutions AI News Coverage

Anthropic has reportedly transitioned stewardship of Petri, its open-source auditing tool for AI alignment research, to Meridian Labs, an independent organization with ties to the UK AI Security Institute. The move is less about a software update and more about a governance precedent: a frontier lab formally separating itself from control of the evaluation infrastructure used to assess its own models.

ai-safety-news open-source-ai-news generative-ai-news anthropic petri meridian-labs uk-aisi alignment-tools

3 entities: Anthropic → Meridian Labs → UK AISI

Key Takeaways

Anthropic has reportedly transferred Petri's stewardship to Meridian Labs, separating alignment evaluation tooling from the lab being evaluated
Petri 3.0 reportedly adds sycophancy, deception, and research sabotage testing ("The Dish") - vendor-claimed; not independently confirmed
UK AISI has reportedly adopted Petri 3.0 for model evaluations (Financial Times, single source), treat as reported until independently confirmed
Evaluators and enterprise safety teams should examine Meridian Labs' governance structure before treating Petri 3.0 assessments as arm's-length independent evaluations

Model Release

Petri 3.0

OrganizationAnthropic (origin) / Meridian Labs (steward)

TypeOpen Source LLM

ParametersN/A, evaluation tool, not a model

BenchmarkNot applicable, Petri is an alignment auditing tool, not an AI model. Epoch AI verification not applicable.

AvailabilityOpen Source (GitHub), free

Petri 3.0 Stewardship Structure

Anthropic

for

Donated Petri to Meridian Labs, structural separation from evaluation tool governing its own models

Meridian Labs

for

Independent steward of Petri; AI safety evaluation tooling organization with UK AISI collaboration history

UK AI Security Institute

for

Reportedly adopted Petri 3.0 for model evaluations (Financial Times, single source, unconfirmed)

Open-source AI safety community

neutral

Downstream beneficiary; independent stewardship enhances credibility of Petri evaluations for third-party researchers

Anthropic has reportedly handed off stewardship of Petri to Meridian Labs, transitioning the open-source alignment auditing tool from internal lab ownership to independent management. According to Anthropic’s Alignment Science page, Petri enables researchers to test hypotheses about model behavior in minutes, using AI agents to explore target models across realistic multi-turn scenarios. Version 3.0 is the current release.

What Petri 3.0 reportedly does

Anthropic states that version 3.0 introduces “The Dish,” an add-on for testing models against sycophancy and deception using real-world system prompts. Per Anthropic’s announcement, the update also adds testing for research sabotage propensity and cooperation with harmful requests. These version-specific features are vendor-claimed and cannot be independently verified from accessible source excerpts, treat them as Anthropic’s stated capabilities until third-party confirmation is available.

The tool itself is open-source and freely available. That’s part of the governance logic: independent stewardship only works if the tool is genuinely accessible to researchers outside the donating lab.

The stewardship move

Meridian Labs, confirmed via meridianlabs.ai as an organization involved in AI safety evaluation tooling, is taking on Petri’s ongoing development and governance. The logic of independent stewardship is structural: an alignment evaluation tool managed by the lab being evaluated creates a credibility problem. Separating the tool from the lab removes that conflict, at least formally.

Verification

Partial Anthropic Alignment Science page (search-retrieved), Meridian Labs website (search-retrieved), Financial Times (T3, single source) Version 3.0 features are vendor-claimed. UK AISI adoption is single-source (Financial Times). Epoch AI verification not applicable, Petri is an evaluation tool, not a model.

The UK AI Security Institute has reportedly adopted Petri 3.0 for its model evaluations, according to the Financial Times. The meridianlabs.ai cross-reference confirms UK AISI involvement in AI safety evaluation tooling broadly, but does not specifically confirm Petri 3.0 adoption. Treat this as a single-source reported claim until independently confirmed.

One practical consideration the announcement doesn’t address: Meridian Labs’ independence from Anthropic needs scrutiny. Organizational separation and operational independence aren’t the same thing, enterprise safety teams evaluating Petri for their own internal assessment programs should examine Meridian Labs’ governance structure, funding sources, and relationship with UK AISI before treating Petri 3.0 evaluations as truly arm’s-length.

Why this matters beyond the tool

Petri is Anthropic’s auditing tool, not a product it sells. Giving it away signals that Anthropic has decided the credibility value of independent stewardship outweighs the advantage of controlling its own evaluation infrastructure. That’s a meaningful signal in an environment where AI labs are under increasing pressure to demonstrate that their safety commitments are structurally enforced, not just policy statements. The broader question of who governs AI safety infrastructure, and who should, is exactly the question this transition is trying to answer.

Analysis

Anthropic giving away Petri signals that independent evaluation credibility is worth more than control of the evaluation tool. If regulators begin requiring that AI conformity assessments use independently governed tooling, Meridian Labs' stewardship of Petri becomes strategically significant, not just for Anthropic, but for the entire evaluation infrastructure ecosystem.

What to watch

Whether the UK AISI adoption claim is independently confirmed. How Meridian Labs structures its governance relative to Anthropic, funding, board composition, decision rights. Whether other frontier labs make analogous moves with their own evaluation infrastructure.

TJS synthesis

The stewardship model Anthropic is testing with Petri, lab donates, independent body manages, regulator adopts, could become the template for how AI safety evaluation infrastructure scales. Regulators evaluating AI systems need tools they can trust as genuinely independent. If Petri’s governance structure holds up to scrutiny, it’s a proof of concept worth watching.

More coverage of Anthropic

Regulation Deep Dive May 9

Three Labs In, One Reportedly Out: How Each Lab's Posture Maps to the US...

Regulation May 9

US Mandatory AI Vetting: Three Labs In, Anthropic Reportedly in Friction With Pentagon

Markets May 9

Anthropic's Revenue Run Rate Is Reportedly $45B, or $30B. What the Discrepancy Means for...

Technology Deep Dive May 8

Open Source AI News: Why Anthropic Giving Away Petri Is the Most Important Part...

View Source

More Technology intelligence

View all Technology

Deep Dive Available Open Source AI News: Why Anthropic Giving Away Petri Is the Most...

Gallery

Contacts