Open Source AI News: Why Anthropic Giving Away Petri Is the Most Important Part of the 3.0 Story

May 8, 2026 5 min read Anthropic Alignment Science / Meridian Labs / Financial Times Partial Moderate

Tech Jacks Solutions AI News Coverage

The version features matter less than the governance move. Anthropic has reportedly transferred Petri, its open-source alignment auditing tool, to Meridian Labs, an independent organization with UK AI Security Institute ties. That transition creates a three-entity stewardship structure, lab donates, independent body manages, regulator adopts, that may become the template for how AI safety evaluation infrastructure is governed as regulatory requirements for independent conformity assessment begin to take shape.

ai-safety-news open-source-ai-news generative-ai-news anthropic petri meridian-labs uk-aisi alignment-tools ai-governance conformity-assessment

Key Takeaways

The governance move, not the version features, is the Petri 3.0 story: Anthropic formally separated itself from control of its own alignment evaluation tool
Three-entity stewardship structure (Anthropic → Meridian Labs → UK AISI) may become a template for independent AI safety evaluation infrastructure
Petri 3.0's version-specific features (sycophancy, deception, research sabotage testing) are vendor-claimed, independent verification now falls to Meridian Labs and its adopters
Enterprise safety teams should audit Meridian Labs' governance structure before treating
Petri 3.0 evaluations as genuinely arm's-length

Petri 3.0, Three-Entity Stewardship Structure

Anthropic

for

Originator and donor, relinquished control to remove structural conflict of interest in self-evaluation

Meridian Labs

for

Independent steward, manages Petri's ongoing development; AI safety evaluation organization with UK AISI collaboration history

UK AI Security Institute

for

Reportedly adopted Petri 3.0 for model evaluations (Financial Times, single source, treat as reported)

Open-source AI safety community

neutral

Downstream beneficiary; independent stewardship and open-source availability enable third-party auditing

Petri Governance: What Changed

Before transition

Anthropic owned, managed, and used Petri to evaluate its own models, structural conflict of interest regardless of intent

→

After transition

Meridian Labs independently manages Petri; open-source availability enables external audit; UK AISI reportedly adopting for regulatory evaluations

The tool is interesting. The transaction is important.

Petri 3.0 adds new testing capabilities for sycophancy, deception, and research sabotage propensity, significant features for alignment researchers, and ones Anthropic states are meaningful advances over version 2.x. But the version update is not the story that will matter in 12 months. The story is that Anthropic no longer controls Petri at all.

The Transaction: What Anthropic Donated and Why

Petri, per Anthropic’s Alignment Science page, enables researchers to test hypotheses about model behavior in minutes, using AI agents to explore target models across realistic multi-turn scenarios. It’s a fast, flexible tool for the kind of behavioral testing that alignment research demands. Anthropic built it. Anthropic used it. And now, reportedly, Anthropic has handed it to Meridian Labs.

The logic is uncomfortable to acknowledge if you’re a frontier lab: a lab evaluating its own models with its own tools has a credibility problem. Not necessarily a dishonesty problem, a structural one. Even well-intentioned internal evaluation creates incentive misalignment. The tool’s designers know what it’s looking for. The model’s trainers know what kinds of behaviors might surface under testing. Independent stewardship is an attempt to break that closed loop.

Anthropic’s decision to donate Petri to Meridian Labs is, in effect, an acknowledgment of that structural problem. The credibility value of saying “an independent organization manages the tool that evaluates our models” outweighs the operational convenience of keeping that tool in-house.

Meridian Labs: What Independent Stewardship Actually Requires

Meridian Labs is confirmed as an organization involved in AI safety evaluation tooling with collaboration history connecting it to the UK AI Security Institute. That combination, independent body, regulator relationship, is precisely the profile that makes Meridian Labs the right kind of steward, at least in theory.

But organizational separation and genuine operational independence are different things. Enterprise safety teams evaluating Petri for their own internal assessment programs should ask several questions before treating Petri 3.0 evaluations as fully arm’s-length. Who funds Meridian Labs? What is the governance structure, board composition, decision rights, conflict-of-interest policies? Does Anthropic retain any influence over Petri’s development roadmap, even informally? These questions aren’t cynical; they’re the standard due diligence that institutional independence requires.

Model Release

Petri 3.0

OrganizationMeridian Labs (steward) / Anthropic (origin)

TypeOpen Source LLM

ParametersN/A, evaluation tool, not a model

BenchmarkN/A, Epoch AI verification not applicable to evaluation tooling (Wire category error corrected)

AvailabilityOpen Source (GitHub), free; independently stewarded

Disputed Claim

UK AISI has adopted Petri 3.0 for its latest model evaluations

Single-source claim (Financial Times, T3). Cross-reference confirms UK AISI collaboration with Meridian Labs on evaluation tooling broadly, but does not specifically confirm Petri 3.0 adoption.

Treat as reported until independently confirmed. Do not use as a legitimacy anchor for internal governance decisions without primary source verification.

Verification

Partial Anthropic Alignment Science (search-retrieved), Meridian Labs (search-retrieved), Financial Times (T3, single source for AISI claim) Version 3.0 features vendor-claimed. UK AISI adoption single-source. Meridian Labs governance structure not independently audited.

The open-source availability of Petri helps here. If the tool’s code is publicly auditable, independent researchers can assess what it tests and what it doesn’t. Open-source governance is imperfect, but it’s substantially more transparent than a proprietary evaluation framework managed by the lab being evaluated.

The UK AISI Dimension: What Regulatory Adoption Signals

The Financial Times has reportedly stated that the UK AI Security Institute has adopted Petri 3.0 for its model evaluations. The Meridian Labs website confirms broader UK AISI involvement in AI safety evaluation tooling, but this cross-reference describes the Inspect evaluation framework and UK AISI collaboration generally, not specifically Petri 3.0 adoption. The UK AISI adoption claim should be treated as single-source until independently confirmed.

If true, the adoption signal matters beyond Petri specifically. The question of who governs AI safety infrastructure is partly a question of whose tools regulators actually use. A regulator adopting an independently stewarded, open-source evaluation tool, rather than building proprietary assessment infrastructure from scratch – sets a precedent for how AI safety evaluation scales without requiring every jurisdiction to reinvent the toolchain.

For compliance teams tracking EU AI Act conformity assessment requirements, this is worth watching closely. The EU AI Act’s high-risk system provisions require robust conformity assessment, but the regulation doesn’t specify which evaluation tools are acceptable. If independent bodies like Meridian Labs, using openly governed tools like Petri, become the de facto infrastructure for regulatory model evaluation, that shapes what “conformity assessment” means in practice. See the existing TJS analysis of why agentic AI is harder to certify under the EU AI Act for the broader conformity assessment context.

Petri 3.0: What the Version Actually Claims

Anthropic states that version 3.0 introduces “The Dish,” an add-on designed to test models against sycophancy and deception using real-world system prompts. Per Anthropic’s announcement, the update also adds testing for research sabotage propensity and cooperation with harmful requests. These are vendor-claimed capabilities, the version-specific features cannot be independently verified from accessible source excerpts at time of publication.

The behavioral categories Petri 3.0 reportedly targets, sycophancy, deception, research sabotage, are meaningful precisely because they’re hard to catch with standard benchmarks. A model can score well on capability benchmarks while exhibiting the kind of subtle deference or strategic misdirection that makes it unreliable in high-stakes research contexts. Petri’s multi-turn, adversarial scenario design is specifically intended to surface those behaviors. Whether version 3.0’s “The Dish” add-on delivers on that intent is a question for independent evaluation, which, notably, is now the responsibility of Meridian Labs and its adopters, not Anthropic.

Unanswered Questions

Who funds Meridian Labs, and does Anthropic retain any formal or informal influence over Petri's development roadmap?
Does Meridian Labs publish a conflict-of-interest policy, and what are its board governance structures?
Has UK AISI independently confirmed Petri 3.0 adoption, or is the Financial Times report the sole source?
Will Petri 3.0's evaluation results for Anthropic's models be published independently, and by whom?

Opportunity

If regulatory frameworks begin requiring that AI conformity assessments use independently governed evaluation tooling, the organizations building that infrastructure now, Meridian Labs, UK AISI, and analogues, become essential to the next phase of AI governance. The Petri stewardship model is small in scale and potentially large in precedent.

What Enterprise AI Safety Teams Should Do Now

Three concrete actions for safety teams evaluating Petri 3.0 for internal use. First, review Meridian Labs’ governance documentation, specifically funding sources, board composition, and Anthropic’s ongoing relationship with the organization. Second, assess Petri’s open-source codebase directly; don’t rely on Anthropic’s characterization of what it tests. Third, treat the UK AISI adoption claim as unconfirmed until independently verified, it matters for legitimacy arguments internally, and building a governance case on a single-source reported claim is a risk.

The Emerging Pattern

The Petri stewardship move fits a pattern that’s visible across this cycle: AI labs separating themselves from control of infrastructure that could be perceived as self-interested. The Anthropic-Pentagon contract structure raised related questions about who controls AI safety guardrails in high-stakes deployments. The CAISI commitments establish voluntary safety infrastructure at the industry level. Petri’s transition to Meridian Labs is the evaluation tool layer of the same structural argument: credible safety requires separation.

Whether this pattern holds, whether independent stewardship produces genuinely more reliable evaluation than internal development, is an empirical question. The alignment research community will tell us, over time, whether Petri under Meridian Labs produces different results than Petri under Anthropic would have.

TJS Synthesis

The lab-donates, independent-body-manages, regulator-adopts structure Anthropic is piloting with Petri and Meridian Labs is worth tracking as a governance template, not just a software update. If regulatory frameworks begin requiring that AI conformity assessments use independently governed evaluation tooling, and the EU AI Act’s trajectory suggests this pressure will build, then the organizations building that independent infrastructure now are positioning themselves as essential infrastructure for the next phase of AI governance. Meridian Labs is a small organization. Petri is a single tool. But the structural precedent they represent is considerably larger.

More coverage of Anthropic

Regulation Deep Dive May 9

Three Labs In, One Reportedly Out: How Each Lab's Posture Maps to the US...

Regulation May 9

US Mandatory AI Vetting: Three Labs In, Anthropic Reportedly in Friction With Pentagon

Markets May 9

Anthropic's Revenue Run Rate Is Reportedly $45B, or $30B. What the Discrepancy Means for...

Technology May 8

AI Safety News: Anthropic Transfers Petri Alignment Tool to Meridian Labs, What Independent Stewardship...

View Source

More Technology intelligence

View all Technology

Gallery

Contacts