AI Safety Levels (ASL) Explained: Anthropic's Responsible Scaling Policy in 2026
AI Safety Levels are the rungs of a governance ladder that Anthropic built for itself. They sit inside the company's Responsible Scaling Policy, and they answer a single question: as a model gets more capable, what safeguards must be in place before Anthropic is willing to train or deploy it? This is not a regulation and not an industry standard. It is one company's voluntary framework, published openly, and it has become one of the most-cited reference points in frontier AI governance. This explainer walks through the framework as Anthropic defines it: the policy itself, the higher safety levels (ASL-3 and ASL-4), the chemical-biological capability thresholds (CB-1 and CB-2), and a concrete worked example using the models that triggered ASL-3 in mid-2026.
Read this if you write AI policy, sit on a governance or risk committee, evaluate frontier models for procurement, or simply want to understand what "ASL-3" means when a vendor cites it. We frame the cybersecurity and biology dimensions strictly at the governance level, with no operational detail, because that is the level at which this framework is meant to be understood.
What the Responsible Scaling Policy Is
The Responsible Scaling Policy, or RSP, is Anthropic's voluntary framework for managing catastrophic risks from its own AI systems. Two words in that sentence carry the weight. "Voluntary" means no regulator imposed it; Anthropic wrote it and committed to it publicly. "Catastrophic" narrows the scope: the RSP is not about everyday content moderation or bias. It is about the small set of capabilities that could plausibly contribute to mass-casualty events, such as helping to build chemical or biological weapons or to conduct high-end cyberattacks.
Mechanically, the RSP does two jobs. First, it governs how Anthropic identifies and evaluates risk in a model, through a set of capability evaluations run before and during training. Second, it governs the deployment decision itself: whether a model can be released, and under what conditions. The bridge between those two jobs is the AI Safety Level. A model's evaluated capabilities determine its ASL, and its ASL determines the safeguards that must be in place before it ships.
It is worth stating plainly that this is Anthropic's framework, not a shared standard. Other labs have their own versions, and external instruments like the EU AI Act and the NIST AI Risk Management Framework operate on different terms entirely. When a vendor or a news report says a model is "ASL-3," they are using Anthropic's vocabulary. That does not make it less useful as a reference point, but a governance professional should be precise about whose definition is in play.
The core idea in one line. Capabilities go up, safeguards go up to match. The ASL is the dial that links the two, so that a more dangerous capability cannot ship without a correspondingly stronger set of protections.
The Safety Level Ladder
Think of AI Safety Levels as rungs on a ladder, modeled loosely on biosafety levels in laboratory science. Each rung adds requirements rather than replacing them, so a model at a higher level inherits everything below it plus new obligations. The framework spans lower baseline tiers up through the higher levels where the most demanding protections live. The cards below summarize the rungs that matter most for any 2026 discussion of frontier models.
A deliberate note on the lower tiers: the grounded sources for this article do not spell out the specific requirements of ASL-1 and ASL-2, so we do not invent them. Treat them as the baseline and early-precaution rungs, and read Anthropic's published Responsible Scaling Policy for their exact definitions. The two rungs that carry concrete, documented requirements in 2026 are ASL-3 and ASL-4, and the next two sections take each in turn.
ASL-3 in Detail
ASL-3 is the first level where the RSP demands a strict, named set of mitigations. It splits into two categories that a governance reader should keep separate in their head: deployment safeguards, which limit what a deployed model will do, and security safeguards, which protect the model itself from being stolen.
Deployment Safeguards
On the deployment side, ASL-3 requires real-time classifier guards: systems that watch requests and responses and intervene when a query heads toward a restricted capability. Because over-broad guards would block legitimate work, the framework pairs them with access controls for guard exemptions, so vetted users with a legitimate need can be granted scoped access rather than being blocked wholesale. Anthropic backs the classifiers with a bug bounty that pays external researchers to find ways around them, ongoing threat intelligence to track how adversaries adapt, and a commitment to rapid jailbreak response so that newly discovered bypasses are closed quickly rather than lingering.
Security Safeguards
On the security side, ASL-3 requires controls specifically aimed at preventing theft of the model weights. The reasoning is direct: a guardrail only protects a model that stays inside Anthropic's control. If an attacker exfiltrates the raw weights, every deployment-side safeguard is moot, because the thief can run the model with no guards at all. ASL-3 weight protection is calibrated against non-state attackers, such as criminal groups and insiders, rather than nation-states. That distinction is precisely what separates ASL-3 from ASL-4.
Why ASL-3 is the practical floor. For any frontier model in 2026 that can meaningfully assist with restricted capabilities, ASL-3 is the level where governance stops being aspirational and becomes a concrete checklist: classifiers live, exemption process defined, bounty open, threat intel running, weights hardened against theft.
ASL-4 in Detail
ASL-4 raises the bar in two distinct ways, and both are about adversaries and assurances that are an order of magnitude harder than ASL-3.
State-Level Weight Protection
The first ASL-4 requirement is the ASL-4 Security Standard: protecting model weights against state-level adversaries. This is a categorical jump from ASL-3. Defending weights against a criminal group or a malicious insider is hard; defending them against a well-resourced national intelligence service, with the budget, patience, and supply-chain reach that implies, is a different class of problem. The point of the standard is to ensure that the most capable models cannot be quietly exfiltrated by a state actor and then run without any of the safeguards that the public deployment carries.
The Affirmative Safety Case
The second ASL-4 requirement is an affirmative safety case: a positive, evidenced argument that the model's misalignment risks are mitigated. This is a meaningful shift in burden of proof. At lower levels the implicit question is "have we found a problem?" An affirmative safety case flips it to "can we show, with evidence, that the model will not pursue goals counter to its operators in ways that cause catastrophic harm?" Proving a negative about a complex system is genuinely difficult, and that difficulty is the point: ASL-4 is meant to be reached only when a lab can make that case credibly.
ASL-3 versus ASL-4 in one contrast. ASL-3 hardens a model against misuse and against theft by non-state actors. ASL-4 hardens it against theft by state actors and adds a requirement to affirmatively demonstrate that the model is not dangerously misaligned. The first is about keeping bad actors out; the second adds proving the model itself is safe to run.
CB-1 vs CB-2: The Chemical-Biological Thresholds
If AI Safety Levels are the rungs, the CB thresholds are one of the rulers Anthropic uses to decide which rung a model lands on. CB stands for chemical and biological capability, and the framework draws a sharp, governance-relevant line between two levels of concern. We describe these strictly as capability thresholds; there is no operational content here, by design.
A model crosses CB-1 when it can significantly help people with basic technical backgrounds, on the order of an undergraduate STEM education, to create, obtain, or deploy chemical or biological weapons with catastrophic potential. The key phrase is "non-novel": the weapons in question already exist and are documented somewhere. The concern is that the model lowers the barrier for a wider pool of people to reach known-dangerous outcomes faster.
A model crosses CB-2 when it functionally substitutes for scarce human expertise, to the point that a well-resourced team could conduct end-to-end novel pathogen design and deployment. "Novel" is the operative distinction: this is not about speeding up access to known threats, it is about enabling the creation of new ones that did not previously exist. CB-2 represents a categorically more serious capability and would carry correspondingly stricter obligations.
The gap between these two is not a matter of degree on a single scale; it is a difference in kind. CB-1 is about democratizing access to existing dangers. CB-2 is about manufacturing new dangers. A governance framework that conflated the two would either over-restrict capable-but-not-novel models or dangerously under-restrict genuinely novel ones, so the distinction does real work.
A Worked Example: ASL-3 and CB-1 in Practice
Frameworks are easiest to understand against a real classification. Anthropic's most capable mid-2026 models, released under the dual names Claude Fable 5 and Mythos 5, were classified as ASL-3 because Anthropic treated them as CB-1. Walking through why is the clearest way to see the framework operate.
Why CB-1, Not Lower
Anthropic concluded the models could provide actionable information and cross-domain synthesis that saves experts time in the relevant scientific domains. That uplift, even for people with only basic technical backgrounds, is exactly what the CB-1 threshold describes. So the model cleared CB-1, which under the RSP triggered the ASL-3 safeguards described above: classifier guards on the deployed model, an exemption process for vetted users, a bug bounty, threat intelligence, rapid jailbreak response, and weight protection against non-state theft.
Why Not CB-2
Crucially, Anthropic concluded the same models did not cross CB-2. The stated reasons are specific and worth noting because they show where current frontier capability actually tops out: the models still struggle with open-ended ideation, and they struggle to recover from critical scientific errors without expert steering. In other words, they can accelerate someone who already knows roughly what they are doing, but they do not yet substitute for the scarce expertise that CB-2 describes. That is the line between CB-1 and CB-2, drawn against a real model.
The "near the border" framing is the most important takeaway from the example. A model can be safely below the higher threshold today and still prompt a lab to say, in effect, that the next capability jump may not be. For anyone tracking AI risk, that is the line to watch: not whether a given model is CB-2, but how quickly the frontier is closing the distance to it. The split-release design, where the safeguarded model is generally available and the less-restricted variant is confined to vetted defenders, is the operational expression of taking that nearness seriously.
How ASL Relates to External Governance
The RSP does not exist in a vacuum, and a governance reader should know how it lines up against the external instruments that actually carry legal or regulatory weight. The honest framing is that ASL is a voluntary, vendor-specific commitment that sits alongside, not inside, the formal frameworks.
How the Framework Evolved Into Practice
The ASL framework moved from a written policy to a live deployment gate over the course of Anthropic's recent model releases. The timeline below traces that progression at the governance level, without restating model-by-model capabilities, which belong in the model reviews this article links to.
Video Resources
These are live YouTube searches rather than fixed video links, so they surface current explainers and stay accurate as new coverage lands. Each opens a search for the topic on YouTube.
Go Deeper
Resources from across Tech Jacks Solutions
AI Governance Hub
Frameworks, policies, and controls for governing AI in your org
FREEAI Governance Charter
Establish your organization's AI principles in one document
EU AI Act Overview
What the EU AI Act requires of general-purpose AI providers
AI Glossary
Definitions for AI governance terms used in this article