Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

Anthropic Claude

AI Safety Levels (ASL) Explained: Anthropic's Responsible Scaling Policy in 2026

AI Safety Levels are the rungs of a governance ladder that Anthropic built for itself. They sit inside the company's Responsible Scaling Policy, and they answer a single question: as a model gets more capable, what safeguards must be in place before Anthropic is willing to train or deploy it? This is not a regulation and not an industry standard. It is one company's voluntary framework, published openly, and it has become one of the most-cited reference points in frontier AI governance. This explainer walks through the framework as Anthropic defines it: the policy itself, the higher safety levels (ASL-3 and ASL-4), the chemical-biological capability thresholds (CB-1 and CB-2), and a concrete worked example using the models that triggered ASL-3 in mid-2026.

Read this if you write AI policy, sit on a governance or risk committee, evaluate frontier models for procurement, or simply want to understand what "ASL-3" means when a vendor cites it. We frame the cybersecurity and biology dimensions strictly at the governance level, with no operational detail, because that is the level at which this framework is meant to be understood.


ASL-3
The safety level Anthropic assigned to its most capable mid-2026 models, triggered by a CB-1 chemical-biological capability classification. ASL-3 is the first tier where strict deployment and security safeguards become mandatory under the Responsible Scaling Policy, which is why it is the level most worth understanding in detail. Source: Anthropic Responsible Scaling Policy.

What the Responsible Scaling Policy Is

The Responsible Scaling Policy, or RSP, is Anthropic's voluntary framework for managing catastrophic risks from its own AI systems. Two words in that sentence carry the weight. "Voluntary" means no regulator imposed it; Anthropic wrote it and committed to it publicly. "Catastrophic" narrows the scope: the RSP is not about everyday content moderation or bias. It is about the small set of capabilities that could plausibly contribute to mass-casualty events, such as helping to build chemical or biological weapons or to conduct high-end cyberattacks.

Mechanically, the RSP does two jobs. First, it governs how Anthropic identifies and evaluates risk in a model, through a set of capability evaluations run before and during training. Second, it governs the deployment decision itself: whether a model can be released, and under what conditions. The bridge between those two jobs is the AI Safety Level. A model's evaluated capabilities determine its ASL, and its ASL determines the safeguards that must be in place before it ships.

It is worth stating plainly that this is Anthropic's framework, not a shared standard. Other labs have their own versions, and external instruments like the EU AI Act and the NIST AI Risk Management Framework operate on different terms entirely. When a vendor or a news report says a model is "ASL-3," they are using Anthropic's vocabulary. That does not make it less useful as a reference point, but a governance professional should be precise about whose definition is in play.

The core idea in one line. Capabilities go up, safeguards go up to match. The ASL is the dial that links the two, so that a more dangerous capability cannot ship without a correspondingly stronger set of protections.


The Safety Level Ladder

Think of AI Safety Levels as rungs on a ladder, modeled loosely on biosafety levels in laboratory science. Each rung adds requirements rather than replacing them, so a model at a higher level inherits everything below it plus new obligations. The framework spans lower baseline tiers up through the higher levels where the most demanding protections live. The cards below summarize the rungs that matter most for any 2026 discussion of frontier models.

ASL-1 and ASL-2
Lower and baseline tiers. Specific requirements are defined in Anthropic's published RSP rather than summarized here.
Scope Baseline
Reference RSP doc
ASL-4
State-level weight protection plus an affirmative safety case for misalignment.
Trigger Higher capability
Focus State actors

A deliberate note on the lower tiers: the grounded sources for this article do not spell out the specific requirements of ASL-1 and ASL-2, so we do not invent them. Treat them as the baseline and early-precaution rungs, and read Anthropic's published Responsible Scaling Policy for their exact definitions. The two rungs that carry concrete, documented requirements in 2026 are ASL-3 and ASL-4, and the next two sections take each in turn.


ASL-3 in Detail

ASL-3 is the first level where the RSP demands a strict, named set of mitigations. It splits into two categories that a governance reader should keep separate in their head: deployment safeguards, which limit what a deployed model will do, and security safeguards, which protect the model itself from being stolen.

Deployment Safeguards

On the deployment side, ASL-3 requires real-time classifier guards: systems that watch requests and responses and intervene when a query heads toward a restricted capability. Because over-broad guards would block legitimate work, the framework pairs them with access controls for guard exemptions, so vetted users with a legitimate need can be granted scoped access rather than being blocked wholesale. Anthropic backs the classifiers with a bug bounty that pays external researchers to find ways around them, ongoing threat intelligence to track how adversaries adapt, and a commitment to rapid jailbreak response so that newly discovered bypasses are closed quickly rather than lingering.

Security Safeguards

On the security side, ASL-3 requires controls specifically aimed at preventing theft of the model weights. The reasoning is direct: a guardrail only protects a model that stays inside Anthropic's control. If an attacker exfiltrates the raw weights, every deployment-side safeguard is moot, because the thief can run the model with no guards at all. ASL-3 weight protection is calibrated against non-state attackers, such as criminal groups and insiders, rather than nation-states. That distinction is precisely what separates ASL-3 from ASL-4.

Why ASL-3 is the practical floor. For any frontier model in 2026 that can meaningfully assist with restricted capabilities, ASL-3 is the level where governance stops being aspirational and becomes a concrete checklist: classifiers live, exemption process defined, bounty open, threat intel running, weights hardened against theft.


ASL-4 in Detail

ASL-4 raises the bar in two distinct ways, and both are about adversaries and assurances that are an order of magnitude harder than ASL-3.

State-Level Weight Protection

The first ASL-4 requirement is the ASL-4 Security Standard: protecting model weights against state-level adversaries. This is a categorical jump from ASL-3. Defending weights against a criminal group or a malicious insider is hard; defending them against a well-resourced national intelligence service, with the budget, patience, and supply-chain reach that implies, is a different class of problem. The point of the standard is to ensure that the most capable models cannot be quietly exfiltrated by a state actor and then run without any of the safeguards that the public deployment carries.

The Affirmative Safety Case

The second ASL-4 requirement is an affirmative safety case: a positive, evidenced argument that the model's misalignment risks are mitigated. This is a meaningful shift in burden of proof. At lower levels the implicit question is "have we found a problem?" An affirmative safety case flips it to "can we show, with evidence, that the model will not pursue goals counter to its operators in ways that cause catastrophic harm?" Proving a negative about a complex system is genuinely difficult, and that difficulty is the point: ASL-4 is meant to be reached only when a lab can make that case credibly.

ASL-3 versus ASL-4 in one contrast. ASL-3 hardens a model against misuse and against theft by non-state actors. ASL-4 hardens it against theft by state actors and adds a requirement to affirmatively demonstrate that the model is not dangerously misaligned. The first is about keeping bad actors out; the second adds proving the model itself is safe to run.


CB-1 vs CB-2: The Chemical-Biological Thresholds

If AI Safety Levels are the rungs, the CB thresholds are one of the rulers Anthropic uses to decide which rung a model lands on. CB stands for chemical and biological capability, and the framework draws a sharp, governance-relevant line between two levels of concern. We describe these strictly as capability thresholds; there is no operational content here, by design.

CB-1: Non-Novel Weapons
Uplift for people with basic technical backgrounds.

A model crosses CB-1 when it can significantly help people with basic technical backgrounds, on the order of an undergraduate STEM education, to create, obtain, or deploy chemical or biological weapons with catastrophic potential. The key phrase is "non-novel": the weapons in question already exist and are documented somewhere. The concern is that the model lowers the barrier for a wider pool of people to reach known-dangerous outcomes faster.

CB-1 is the threshold that triggered ASL-3 for Anthropic's most capable mid-2026 models. Anthropic-defined framework.
CB-2: Novel Weapons
Functional substitute for scarce human expertise.

A model crosses CB-2 when it functionally substitutes for scarce human expertise, to the point that a well-resourced team could conduct end-to-end novel pathogen design and deployment. "Novel" is the operative distinction: this is not about speeding up access to known threats, it is about enabling the creation of new ones that did not previously exist. CB-2 represents a categorically more serious capability and would carry correspondingly stricter obligations.

CB-2 is the higher threshold. A model that crosses it would substitute for expertise that is currently a natural bottleneck. Anthropic-defined framework.

The gap between these two is not a matter of degree on a single scale; it is a difference in kind. CB-1 is about democratizing access to existing dangers. CB-2 is about manufacturing new dangers. A governance framework that conflated the two would either over-restrict capable-but-not-novel models or dangerously under-restrict genuinely novel ones, so the distinction does real work.


A Worked Example: ASL-3 and CB-1 in Practice

Frameworks are easiest to understand against a real classification. Anthropic's most capable mid-2026 models, released under the dual names Claude Fable 5 and Mythos 5, were classified as ASL-3 because Anthropic treated them as CB-1. Walking through why is the clearest way to see the framework operate.

Why CB-1, Not Lower

Anthropic concluded the models could provide actionable information and cross-domain synthesis that saves experts time in the relevant scientific domains. That uplift, even for people with only basic technical backgrounds, is exactly what the CB-1 threshold describes. So the model cleared CB-1, which under the RSP triggered the ASL-3 safeguards described above: classifier guards on the deployed model, an exemption process for vetted users, a bug bounty, threat intelligence, rapid jailbreak response, and weight protection against non-state theft.

Why Not CB-2

Crucially, Anthropic concluded the same models did not cross CB-2. The stated reasons are specific and worth noting because they show where current frontier capability actually tops out: the models still struggle with open-ended ideation, and they struggle to recover from critical scientific errors without expert steering. In other words, they can accelerate someone who already knows roughly what they are doing, but they do not yet substitute for the scarce expertise that CB-2 describes. That is the line between CB-1 and CB-2, drawn against a real model.

Near the border
Anthropic's own characterization of where these models sit relative to CB-2: classified CB-1, but close enough to the higher threshold that the company flagged it explicitly. That phrasing is itself a governance signal, telling readers the gap between today's frontier and the more serious threshold is narrowing. Source: Anthropic.

The "near the border" framing is the most important takeaway from the example. A model can be safely below the higher threshold today and still prompt a lab to say, in effect, that the next capability jump may not be. For anyone tracking AI risk, that is the line to watch: not whether a given model is CB-2, but how quickly the frontier is closing the distance to it. The split-release design, where the safeguarded model is generally available and the less-restricted variant is confined to vetted defenders, is the operational expression of taking that nearness seriously.


How ASL Relates to External Governance

The RSP does not exist in a vacuum, and a governance reader should know how it lines up against the external instruments that actually carry legal or regulatory weight. The honest framing is that ASL is a voluntary, vendor-specific commitment that sits alongside, not inside, the formal frameworks.

EU AI Act
The EU AI Act imposes transparency and risk-management obligations on providers of general-purpose AI, backed by law. Where the RSP is a self-imposed deployment gate, the AI Act is a binding regulatory regime. The two can be complementary, but they are different in nature: one is a company commitment, the other is enforceable regulation. See our EU AI Act overview.
NIST AI Risk Management Framework
The NIST AI RMF is a voluntary, widely adopted framework for identifying and managing AI risk across an organization. It is process-oriented and applies broadly, rather than defining capability tiers for a single vendor's models. ASL can be read as one lab's concrete, capability-keyed implementation of the kind of risk management NIST describes in general terms. See the NIST AI RMF.
Your Own Governance Program
If you procure or deploy frontier models, the practical use of ASL is as a vendor-disclosure datapoint that feeds your internal governance. Knowing a model is ASL-3 and CB-1 tells you what class of capability and what class of safeguards the vendor claims. Map that into your own risk register rather than treating it as a substitute for your own controls. Start with our AI Governance Hub.

How the Framework Evolved Into Practice

The ASL framework moved from a written policy to a live deployment gate over the course of Anthropic's recent model releases. The timeline below traces that progression at the governance level, without restating model-by-model capabilities, which belong in the model reviews this article links to.

The Policy
RSP Published as a Voluntary Commitment
Anthropic publishes the Responsible Scaling Policy defining AI Safety Levels and tying them to capability evaluations and deployment decisions. The full text lives in Anthropic's published RSP.
Restricted Release Pattern
Gated Access for High-Capability Models
Anthropic establishes a pattern of confining its most capable, dual-use models to vetted partners through restricted-access programs, the governance precursor to the 2026 split-release design. See our Project Glasswing and Claude Mythos breakdowns.
Mid 2026
ASL-3 Triggered by a CB-1 Classification
Anthropic's most capable models are classified CB-1, triggering ASL-3 safeguards and shipping under a dual-name design: a safeguarded version generally available, a less-restricted version confined to vetted defenders. The worked example above details this case.
What to Watch
The Distance to CB-2 and ASL-4
With current models described as near the CB-2 border, the governance question is how soon a future model crosses it, invoking the stricter ASL-4 standards: state-level weight protection and an affirmative safety case.


Fact-checked against vendor documentation and official sources, June 2026. AI Safety Levels and CB thresholds are Anthropic's own framework; consult Anthropic's published RSP for authoritative definitions.
Claude, Claude Fable 5, Claude Mythos 5, Project Glasswing, the Responsible Scaling Policy, AI Safety Levels, and the CB capability thresholds are frameworks and trademarks of Anthropic PBC. The EU AI Act is an instrument of the European Union. The AI Risk Management Framework is published by the US National Institute of Standards and Technology (NIST). This article is editorial and is not sponsored, reviewed, or endorsed by Anthropic.
Before You Use AI
Your Privacy

A model's safety level describes its capabilities and the vendor's safeguards; it does not describe how your data is handled. Anthropic's commercial API and business plans do not use customer data to train models, while free-tier conversations may be used for training unless you opt out. Enterprise and Team agreements carry custom data-retention terms. Review the vendor's privacy policy before submitting sensitive code, customer data, or regulated information to any model, regardless of its ASL classification.

Mental Health & AI Dependency

Safety frameworks like the RSP address catastrophic misuse risks, not the everyday risk of over-relying on AI for judgment. Keep a human in the loop for any consequential or irreversible decision, and treat model output as input to your own reasoning rather than a substitute for it. If you or someone you know is experiencing a mental health crisis:

  • 988 Suicide & Crisis Lifeline -- Call or text 988 (US)
  • SAMHSA Helpline -- 1-800-662-4357
  • Crisis Text Line -- Text HOME to 741741

AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.

Your Rights & Our Transparency

Under GDPR and CCPA you have the right to access, correct, and delete your personal data. The EU AI Act imposes transparency and risk obligations on general-purpose AI providers, separate from any vendor's voluntary safety framework. Tech Jacks Solutions maintains editorial independence from all vendors, including Anthropic. This explainer was not sponsored, reviewed, or approved by Anthropic, and we receive no affiliate commission. The AI Safety Level and CB threshold definitions described here are Anthropic's own framework, drawn from its published policy and system-card disclosures.