What is Anthropic's Responsible Scaling Policy (RSP)?

The Responsible Scaling Policy is Anthropic's voluntary framework for managing catastrophic AI risks. It governs how Anthropic identifies and evaluates risk in its models and how it makes deployment decisions. It is Anthropic's own framework, not an industry standard, and the company publishes it openly.

What are AI Safety Levels (ASL)?

AI Safety Levels are tiers within Anthropic's Responsible Scaling Policy that match a model's capabilities to required safeguards. Higher levels demand stricter protections. ASL-3 requires strict deployment and security mitigations; ASL-4 requires protecting model weights against state-level adversaries plus an affirmative safety case. Lower levels (ASL-1 and ASL-2) are baseline tiers; consult Anthropic's published RSP for their specifics.

Why are Fable 5 and Mythos 5 classified as ASL-3?

Anthropic treats Fable 5 and Mythos 5 as CB-1: the model can provide actionable information and cross-domain synthesis that saves experts time, which triggers ASL-3 safeguards. It does not cross the CB-2 threshold because it still struggles with open-ended ideation and with recovering from critical scientific errors. Anthropic describes it as near the CB-2 border, which is why the framework matters here.

Is the ASL framework an industry standard?

No. AI Safety Levels and the CB capability thresholds are Anthropic's own framework, defined and applied by Anthropic. They are not a regulatory requirement or a cross-industry standard. They do sit alongside external governance references such as the EU AI Act and the NIST AI Risk Management Framework, but those are separate instruments.

Anthropic Claude

AI Safety Levels (ASL) Explained: Anthropic's Responsible Scaling Policy in 2026

Q: What is the difference between CB-1 and CB-2?

CB-1 and CB-2 are chemical-biological capability thresholds in Anthropic's framework. CB-1 covers non-novel weapons: a model that can significantly help people with basic technical backgrounds create, obtain, or deploy chemical or biological weapons with catastrophic potential. CB-2 covers novel weapons: a model that functionally substitutes for scarce human expertise, where a well-resourced team could conduct end-to-end novel pathogen design and deployment.

AI Safety Levels are the rungs of a governance ladder that Anthropic built for itself. They sit inside the company's Responsible Scaling Policy, and they answer a single question: as a model gets more capable, what safeguards must be in place before Anthropic is willing to train or deploy it? This is not a regulation and not an industry standard. It is one company's voluntary framework, published openly, and it has become one of the most-cited reference points in frontier AI governance. This explainer walks through the framework as Anthropic defines it: the policy itself, the higher safety levels (ASL-3 and ASL-4), the chemical-biological capability thresholds (CB-1 and CB-2), and a concrete worked example using the models that triggered ASL-3 in mid-2026.

Read this if you write AI policy, sit on a governance or risk committee, evaluate frontier models for procurement, or simply want to understand what "ASL-3" means when a vendor cites it. We frame the cybersecurity and biology dimensions strictly at the governance level, with no operational detail, because that is the level at which this framework is meant to be understood.

ASL-3

The safety level Anthropic assigned to its most capable mid-2026 models, triggered by a CB-1 chemical-biological capability classification. ASL-3 is the first tier where strict deployment and security safeguards become mandatory under the Responsible Scaling Policy, which is why it is the level most worth understanding in detail. Source: Anthropic Responsible Scaling Policy.

What the Responsible Scaling Policy Is

The Responsible Scaling Policy, or RSP, is Anthropic's voluntary framework for managing catastrophic risks from its own AI systems. Two words in that sentence carry the weight. "Voluntary" means no regulator imposed it; Anthropic wrote it and committed to it publicly. "Catastrophic" narrows the scope: the RSP is not about everyday content moderation or bias. It is about the small set of capabilities that could plausibly contribute to mass-casualty events, such as helping to build chemical or biological weapons or to conduct high-end cyberattacks.

Mechanically, the RSP does two jobs. First, it governs how Anthropic identifies and evaluates risk in a model, through a set of capability evaluations run before and during training. Second, it governs the deployment decision itself: whether a model can be released, and under what conditions. The bridge between those two jobs is the AI Safety Level. A model's evaluated capabilities determine its ASL, and its ASL determines the safeguards that must be in place before it ships.

It is worth stating plainly that this is Anthropic's framework, not a shared standard. Other labs have their own versions, and external instruments like the EU AI Act and the NIST AI Risk Management Framework operate on different terms entirely. When a vendor or a news report says a model is "ASL-3," they are using Anthropic's vocabulary. That does not make it less useful as a reference point, but a governance professional should be precise about whose definition is in play.

The core idea in one line. Capabilities go up, safeguards go up to match. The ASL is the dial that links the two, so that a more dangerous capability cannot ship without a correspondingly stronger set of protections.

The Safety Level Ladder

Think of AI Safety Levels as rungs on a ladder, modeled loosely on biosafety levels in laboratory science. Each rung adds requirements rather than replacing them, so a model at a higher level inherits everything below it plus new obligations. The framework spans lower baseline tiers up through the higher levels where the most demanding protections live. The cards below summarize the rungs that matter most for any 2026 discussion of frontier models.

ASL-1 and ASL-2

Lower and baseline tiers. Specific requirements are defined in Anthropic's published RSP rather than summarized here.

Scope Baseline

Reference RSP doc

ASL-3

Strict deployment and security mitigations against misuse and weight theft.

Trigger CB-1

Focus Misuse + theft

ASL-4

State-level weight protection plus an affirmative safety case for misalignment.

Trigger Higher capability

Focus State actors

A deliberate note on the lower tiers: the grounded sources for this article do not spell out the specific requirements of ASL-1 and ASL-2, so we do not invent them. Treat them as the baseline and early-precaution rungs, and read Anthropic's published Responsible Scaling Policy for their exact definitions. The two rungs that carry concrete, documented requirements in 2026 are ASL-3 and ASL-4, and the next two sections take each in turn.

ASL-3 in Detail

ASL-3 is the first level where the RSP demands a strict, named set of mitigations. It splits into two categories that a governance reader should keep separate in their head: deployment safeguards, which limit what a deployed model will do, and security safeguards, which protect the model itself from being stolen.

Deployment Safeguards

On the deployment side, ASL-3 requires real-time classifier guards: systems that watch requests and responses and intervene when a query heads toward a restricted capability. Because over-broad guards would block legitimate work, the framework pairs them with access controls for guard exemptions, so vetted users with a legitimate need can be granted scoped access rather than being blocked wholesale. Anthropic backs the classifiers with a bug bounty that pays external researchers to find ways around them, ongoing threat intelligence to track how adversaries adapt, and a commitment to rapid jailbreak response so that newly discovered bypasses are closed quickly rather than lingering.

Security Safeguards

On the security side, ASL-3 requires controls specifically aimed at preventing theft of the model weights. The reasoning is direct: a guardrail only protects a model that stays inside Anthropic's control. If an attacker exfiltrates the raw weights, every deployment-side safeguard is moot, because the thief can run the model with no guards at all. ASL-3 weight protection is calibrated against non-state attackers, such as criminal groups and insiders, rather than nation-states. That distinction is precisely what separates ASL-3 from ASL-4.

Why ASL-3 is the practical floor. For any frontier model in 2026 that can meaningfully assist with restricted capabilities, ASL-3 is the level where governance stops being aspirational and becomes a concrete checklist: classifiers live, exemption process defined, bounty open, threat intel running, weights hardened against theft.

ASL-4 in Detail

ASL-4 raises the bar in two distinct ways, and both are about adversaries and assurances that are an order of magnitude harder than ASL-3.

State-Level Weight Protection

The first ASL-4 requirement is the ASL-4 Security Standard: protecting model weights against state-level adversaries. This is a categorical jump from ASL-3. Defending weights against a criminal group or a malicious insider is hard; defending them against a well-resourced national intelligence service, with the budget, patience, and supply-chain reach that implies, is a different class of problem. The point of the standard is to ensure that the most capable models cannot be quietly exfiltrated by a state actor and then run without any of the safeguards that the public deployment carries.

The Affirmative Safety Case

The second ASL-4 requirement is an affirmative safety case: a positive, evidenced argument that the model's misalignment risks are mitigated. This is a meaningful shift in burden of proof. At lower levels the implicit question is "have we found a problem?" An affirmative safety case flips it to "can we show, with evidence, that the model will not pursue goals counter to its operators in ways that cause catastrophic harm?" Proving a negative about a complex system is genuinely difficult, and that difficulty is the point: ASL-4 is meant to be reached only when a lab can make that case credibly.

ASL-3 versus ASL-4 in one contrast. ASL-3 hardens a model against misuse and against theft by non-state actors. ASL-4 hardens it against theft by state actors and adds a requirement to affirmatively demonstrate that the model is not dangerously misaligned. The first is about keeping bad actors out; the second adds proving the model itself is safe to run.

CB-1 vs CB-2: The Chemical-Biological Thresholds

If AI Safety Levels are the rungs, the CB thresholds are one of the rulers Anthropic uses to decide which rung a model lands on. CB stands for chemical and biological capability, and the framework draws a sharp, governance-relevant line between two levels of concern. We describe these strictly as capability thresholds; there is no operational content here, by design.

CB-1: Non-Novel Weapons

Uplift for people with basic technical backgrounds.

A model crosses CB-1 when it can significantly help people with basic technical backgrounds, on the order of an undergraduate STEM education, to create, obtain, or deploy chemical or biological weapons with catastrophic potential. The key phrase is "non-novel": the weapons in question already exist and are documented somewhere. The concern is that the model lowers the barrier for a wider pool of people to reach known-dangerous outcomes faster.

CB-1 is the threshold that triggered ASL-3 for Anthropic's most capable mid-2026 models. Anthropic-defined framework.

CB-2: Novel Weapons

Functional substitute for scarce human expertise.

A model crosses CB-2 when it functionally substitutes for scarce human expertise, to the point that a well-resourced team could conduct end-to-end novel pathogen design and deployment. "Novel" is the operative distinction: this is not about speeding up access to known threats, it is about enabling the creation of new ones that did not previously exist. CB-2 represents a categorically more serious capability and would carry correspondingly stricter obligations.

CB-2 is the higher threshold. A model that crosses it would substitute for expertise that is currently a natural bottleneck. Anthropic-defined framework.

The gap between these two is not a matter of degree on a single scale; it is a difference in kind. CB-1 is about democratizing access to existing dangers. CB-2 is about manufacturing new dangers. A governance framework that conflated the two would either over-restrict capable-but-not-novel models or dangerously under-restrict genuinely novel ones, so the distinction does real work.

A Worked Example: ASL-3 and CB-1 in Practice

Frameworks are easiest to understand against a real classification. Anthropic's most capable mid-2026 models, released under the dual names Claude Fable 5 and Mythos 5, were classified as ASL-3 because Anthropic treated them as CB-1. Walking through why is the clearest way to see the framework operate.

Why CB-1, Not Lower

Anthropic concluded the models could provide actionable information and cross-domain synthesis that saves experts time in the relevant scientific domains. That uplift, even for people with only basic technical backgrounds, is exactly what the CB-1 threshold describes. So the model cleared CB-1, which under the RSP triggered the ASL-3 safeguards described above: classifier guards on the deployed model, an exemption process for vetted users, a bug bounty, threat intelligence, rapid jailbreak response, and weight protection against non-state theft.

Why Not CB-2

Crucially, Anthropic concluded the same models did not cross CB-2. The stated reasons are specific and worth noting because they show where current frontier capability actually tops out: the models still struggle with open-ended ideation, and they struggle to recover from critical scientific errors without expert steering. In other words, they can accelerate someone who already knows roughly what they are doing, but they do not yet substitute for the scarce expertise that CB-2 describes. That is the line between CB-1 and CB-2, drawn against a real model.

Near the border

Anthropic's own characterization of where these models sit relative to CB-2: classified CB-1, but close enough to the higher threshold that the company flagged it explicitly. That phrasing is itself a governance signal, telling readers the gap between today's frontier and the more serious threshold is narrowing. Source: Anthropic.

The "near the border" framing is the most important takeaway from the example. A model can be safely below the higher threshold today and still prompt a lab to say, in effect, that the next capability jump may not be. For anyone tracking AI risk, that is the line to watch: not whether a given model is CB-2, but how quickly the frontier is closing the distance to it. The split-release design, where the safeguarded model is generally available and the less-restricted variant is confined to vetted defenders, is the operational expression of taking that nearness seriously.

How ASL Relates to External Governance

The RSP does not exist in a vacuum, and a governance reader should know how it lines up against the external instruments that actually carry legal or regulatory weight. The honest framing is that ASL is a voluntary, vendor-specific commitment that sits alongside, not inside, the formal frameworks.

The EU AI Act imposes transparency and risk-management obligations on providers of general-purpose AI, backed by law. Where the RSP is a self-imposed deployment gate, the AI Act is a binding regulatory regime. The two can be complementary, but they are different in nature: one is a company commitment, the other is enforceable regulation. See our EU AI Act overview.

The NIST AI RMF is a voluntary, widely adopted framework for identifying and managing AI risk across an organization. It is process-oriented and applies broadly, rather than defining capability tiers for a single vendor's models. ASL can be read as one lab's concrete, capability-keyed implementation of the kind of risk management NIST describes in general terms. See the NIST AI RMF.

If you procure or deploy frontier models, the practical use of ASL is as a vendor-disclosure datapoint that feeds your internal governance. Knowing a model is ASL-3 and CB-1 tells you what class of capability and what class of safeguards the vendor claims. Map that into your own risk register rather than treating it as a substitute for your own controls. Start with our AI Governance Hub.

How the Framework Evolved Into Practice

The ASL framework moved from a written policy to a live deployment gate over the course of Anthropic's recent model releases. The timeline below traces that progression at the governance level, without restating model-by-model capabilities, which belong in the model reviews this article links to.

The Policy

RSP Published as a Voluntary Commitment

Anthropic publishes the Responsible Scaling Policy defining AI Safety Levels and tying them to capability evaluations and deployment decisions. The full text lives in Anthropic's published RSP.

Restricted Release Pattern

Gated Access for High-Capability Models

Anthropic establishes a pattern of confining its most capable, dual-use models to vetted partners through restricted-access programs, the governance precursor to the 2026 split-release design. See our Project Glasswing and Claude Mythos breakdowns.

Mid 2026

ASL-3 Triggered by a CB-1 Classification

Anthropic's most capable models are classified CB-1, triggering ASL-3 safeguards and shipping under a dual-name design: a safeguarded version generally available, a less-restricted version confined to vetted defenders. The worked example above details this case.

What to Watch

The Distance to CB-2 and ASL-4

With current models described as near the CB-2 border, the governance question is how soon a future model crosses it, invoking the stricter ASL-4 standards: state-level weight protection and an affirmative safety case.

Video Resources

These are live YouTube searches rather than fixed video links, so they surface current explainers and stay accurate as new coverage lands. Each opens a search for the topic on YouTube.

Anthropic's Responsible Scaling Policy Explained

YouTube Search

Walkthroughs of the RSP and how AI Safety Levels gate model deployment.

ASL-3 and ASL-4 Safety Standards

YouTube Search

Discussion of what the higher safety levels require and why state-level weight protection matters.

Frontier Model Bio and Chem Risk Evaluation

YouTube Search

Coverage of how labs evaluate capability thresholds and the governance debate around them.

Go Deeper

Resources from across Tech Jacks Solutions

AI Governance Hub

Frameworks, policies, and controls for governing AI in your org

FREEAI Governance Charter

Establish your organization's AI principles in one document

EU AI Act Overview

What the EU AI Act requires of general-purpose AI providers

AI Glossary

Definitions for AI governance terms used in this article

Fact-checked against vendor documentation and official sources, June 2026. AI Safety Levels and CB thresholds are Anthropic's own framework; consult Anthropic's published RSP for authoritative definitions.

Claude, Claude Fable 5, Claude Mythos 5, Project Glasswing, the Responsible Scaling Policy, AI Safety Levels, and the CB capability thresholds are frameworks and trademarks of Anthropic PBC. The EU AI Act is an instrument of the European Union. The AI Risk Management Framework is published by the US National Institute of Standards and Technology (NIST). This article is editorial and is not sponsored, reviewed, or endorsed by Anthropic.

Gallery

Contacts

AI Safety Levels (ASL) Explained: Anthropic's Responsible Scaling Policy in 2026

What the Responsible Scaling Policy Is

The Safety Level Ladder

ASL-3 in Detail

Deployment Safeguards

Security Safeguards

ASL-4 in Detail

State-Level Weight Protection

The Affirmative Safety Case

CB-1 vs CB-2: The Chemical-Biological Thresholds

A Worked Example: ASL-3 and CB-1 in Practice

Why CB-1, Not Lower

Why Not CB-2

How ASL Relates to External Governance

How the Framework Evolved Into Practice

Video Resources

Go Deeper

Services

Learn

Company