Amazon Web Services

What Is Amazon Bedrock Guardrails? AWS's AI Safety Layer Explained

Last verified: May 14, 2026 · Format: Breakdown

Content filter categories with configurable sensitivity thresholds (hate, insults, sexual, violence, misconduct, prompt attack)

Source: AWS Bedrock Guardrails documentation

99%

Accuracy of Automated Reasoning checks for hallucination detection

Source: AWS blog, Aug 2025

$0.15

Per 1,000 text units for content filters (80% price reduction, Dec 2024)

Source: AWS Bedrock pricing page

Configurable safeguard policy types across text, image, and code

Source: AWS documentation

85%

Price reduction across guardrail filters announced December 2024

Source: AWS What's New, Dec 2024

Your compliance team asks a question that should be simple: "How do we stop our customer-facing chatbot from discussing competitor products, leaking PII, or hallucinating a refund policy that doesn't exist?" Amazon Bedrock Guardrails is AWS's answer, and it operates as a standalone safety layer that sits between any foundation model and your end users.

The critical distinction: Guardrails is not a model. It is a configurable enforcement pipeline that evaluates every input and output against policies you define, acting as a safety layer between any foundation model and your users, then blocks, redacts, or flags content that violates those rules. It works with Bedrock-hosted models (Claude, Llama, Mistral), self-hosted models, and third-party APIs including OpenAI and Google Gemini through the ApplyGuardrail API. That model-agnostic architecture means you write your safety rules once and apply them everywhere.

What Is Amazon Bedrock Guardrails

Amazon Bedrock Guardrails is a managed AI safety service within Amazon Bedrock that provides configurable safeguards to detect and filter harmful content, redact sensitive information, prevent topic drift, and validate model responses against factual policies. It became generally available on April 23, 2024, alongside the broader Bedrock platform expansion.

The service exposes six safeguard policy types: content filters (text and image), denied topics, word filters, sensitive information filters (PII detection and regex), contextual grounding checks, and Automated Reasoning checks. Each policy operates independently. You enable only what you need, and you pay only for the filters you activate.

Guardrails integrates natively with Bedrock foundation models, Bedrock Agents, Bedrock Knowledge Bases, and Bedrock Flows. For models outside the Bedrock ecosystem, the ApplyGuardrail API provides the same policy enforcement without requiring model invocation through Bedrock at all. AWS describes this as "model-independent safety measures," and in practice it means a single guardrail configuration can protect a Claude deployment on Bedrock, a self-hosted Llama instance on SageMaker, and an OpenAI GPT call from your own application server.

12+

Programming languages supported through the AWS SDK for guardrail integration, enabling enforcement across Bedrock-hosted, self-hosted, and third-party model deployments

Source: AWS Bedrock Guardrails documentation

For context on where Bedrock Guardrails fits in the broader AI tools landscape, visit the AI Tools Hub and the AWS sub-hub.

How It Works: The Processing Pipeline

Bedrock Guardrails operates as an inline evaluation layer with two enforcement points: one before the prompt reaches the model (input guardrail) and one after the model generates a response (output guardrail). Both run the same policy stack, but they apply independently.

Input Flow

When a user sends a prompt, Guardrails evaluates it against every enabled policy in parallel. If the prompt contains a denied topic, harmful content, blocked words, or sensitive information that should be redacted, the guardrail intervenes before the model ever sees the request. Blocked prompts return a configurable message (you write the response, not AWS). Redacted prompts continue to the model with PII replaced by placeholders.

Output Flow

After the model generates a response, the same policy stack evaluates the output. Content filters catch harmful generations, contextual grounding checks flag hallucinations against your reference source, and Automated Reasoning checks validate factual claims against formal logic policies. Violations trigger configurable actions: block the entire response, redact specific content, or flag for human review.

ApplyGuardrail API (Model-Independent)

The ApplyGuardrail API decouples the evaluation engine from model invocation entirely. You send text (or images) directly to the API, and it returns the evaluation result without calling any foundation model. This enables three critical patterns: applying guardrails to third-party model outputs (OpenAI, Google Gemini), using guardrails with self-hosted models on SageMaker or EC2, and running pre-flight validation on prompts before they enter any model pipeline. For teams running autonomous agents in AWS, see how the AWS DevOps Agent pairs with Guardrails for safe infrastructure automation, and our step-by-step DevOps Agent setup guide walks through configuring it under least-privilege access.

Configurable safeguard policy types - content filters, denied topics, word filters, PII detection, contextual grounding, and automated reasoning - each with independent threshold controls

Source: AWS Bedrock Guardrails documentation

Policy Types: The Six Safeguard Categories

Each policy type addresses a distinct risk category. You configure them independently through the Bedrock console, AWS CLI, CloudFormation, or Terraform. Here is what each one does and when to use it.

1. Content Filters

Detect and filter harmful text or image content across six categories: Hate, Insults, Sexual, Violence, Misconduct, and Prompt Attack (jailbreak and prompt injection attempts). You set the sensitivity threshold for each category independently on a scale from low to high. Image content filtering evaluates uploaded images against the same categories. The Standard tier extends detection to code elements, identifying harmful content embedded in comments, variable names, function names, and string literals.

2. Denied Topics

Define topics your application must never discuss. You describe each denied topic in natural language (for example, "investment advice" or "competitor product recommendations"), and the guardrail classifies incoming prompts and model responses against those definitions. This is topic-level control, not keyword matching: the model understands semantic intent, so a user rephrasing "stock tips" as "what should I invest in" still triggers the filter.

3. Word Filters

Exact-match keyword blocking for profanity, competitor names, internal project codenames, or any custom word list you define. This is the simplest policy type: if the exact word or phrase appears in the input or output, it triggers the configured action. Word filters are free; no per-unit charge applies.

Cost for word filters and regex-based sensitive information filters - exact-match keyword blocking and custom pattern matching with no per-unit charge

Source: AWS Bedrock Guardrails pricing page, May 2026

4. Sensitive Information Filters (PII)

Detect and redact personally identifiable information including names, email addresses, phone numbers, Social Security numbers, credit card numbers, and IP addresses. You choose between two actions per PII type: block (reject the entire request) or anonymize (replace with a placeholder like {EMAIL}). Custom regex patterns are also supported for organization-specific identifiers like employee IDs or account numbers. Regex-based filters are free.

5. Contextual Grounding Checks

The hallucination detection layer for RAG (Retrieval-Augmented Generation) applications. Contextual grounding evaluates whether the model's response is grounded in the provided reference source and relevant to the user's query. You set a grounding threshold (0 to 1) and a relevance threshold independently. Responses that fall below either threshold are blocked. This is critical for knowledge base applications where factual accuracy is non-negotiable: legal research, medical information, financial advice, and compliance documentation.

6. Automated Reasoning Checks

The mathematically verifiable hallucination prevention system. Unlike contextual grounding (which uses probabilistic comparison), Automated Reasoning translates your policies into formal logic rules and verifies model outputs against them with provable correctness. Covered in detail in the next section.

Pricing note: Each policy type is billed independently. You pay only for the filters you enable. Word filters and regex-based sensitive information filters are free. Content filters and denied topics cost $0.15/1K text units. Sensitive information filters cost $0.10/1K text units. Contextual grounding costs $0.10/1K text units. Automated Reasoning costs $0.17/1K text units per policy. See the Pricing section for the full breakdown.

FREE TEMPLATE

NIST AI RMF Self-Assessment

Self-assess against the NIST AI Risk Management Framework

Download Free →

Automated Reasoning: The 99% Accuracy Claim

Automated Reasoning checks launched in preview at re:Invent 2024 (December 2024) and reached general availability in August 2025 across US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland, Paris). AWS claims it is the first and only major cloud provider to integrate automated reasoning into its generative AI guardrails.

The mechanism works differently from every other hallucination detection approach. Instead of comparing model output to a reference document probabilistically, Automated Reasoning translates your policy documents into formal logical rules. When a model generates a response, the system performs mathematical verification against those rules and produces a provable result: the statement is either verified, contradicted, or indeterminate. There is no confidence score and no probabilistic threshold. The answer is binary.

How It Works in Practice

You provide a policy document in natural language (for example, your HR leave policy, refund terms, or compliance requirements). Bedrock Guardrails automatically generates an Automated Reasoning Policy from that document. When a model response touches topics covered by the policy, the system verifies each claim against the formal rules. Verified claims pass. Contradicted claims are flagged or blocked. Indeterminate claims (not covered by the policy) are explicitly marked as such.

The 99% Figure

AWS states that Automated Reasoning checks "deliver up to 99% accuracy at detecting correct responses from LLMs." This refers specifically to the verification accuracy of claims that fall within the scope of a defined policy. It does not mean the model's outputs are 99% accurate in general. The distinction matters: Automated Reasoning can only verify what the policy covers. Claims outside the policy scope receive no verification.

99%

Verification accuracy for Automated Reasoning checks, providing mathematically provable hallucination detection within defined policy scope

Source: AWS blog, "Minimize AI hallucinations," Aug 2025

November 2025 Update: Natural Language Test Q&A Generation

In November 2025, AWS added automatic test Q&A generation for Automated Reasoning policies. The system now generates test question-and-answer pairs from each policy document, allowing you to validate and refine your policy before deploying it to production. This reduces the iteration cycle from "write policy, deploy, discover edge cases in production" to "write policy, review generated tests, fix gaps, deploy."

Bedrock Guardrails Evolution

Apr 2024

General Availability

Bedrock Guardrails launches with content filters, denied topics, word filters, and sensitive information filters. ApplyGuardrail API enables model-independent evaluation.

Sep 2024

Contextual Grounding + Image Filters

Contextual grounding checks GA for RAG hallucination detection. Multimodal support adds image content filtering.

Dec 2024

Automated Reasoning Preview + 85% Price Cut

Automated Reasoning checks launch in preview at re:Invent. Pricing reduced up to 85% across all filter types.

Jun 2025

Standard and Classic Tiers

Safeguard tiers introduced for content filters and denied topics. Standard tier adds 15%+ recall improvement, 60-language support, and code protection.

Aug 2025

Automated Reasoning GA

Automated Reasoning reaches general availability with 99% verification accuracy claim. Available in US East, US West, and Europe regions.

Nov 2025

Natural Language Test Q&A

Automated Reasoning adds auto-generated test Q&A pairs from policy documents, reducing the policy-to-production validation cycle.

Apr 2026

Cross-Account Safeguards GA

Centralized guardrail enforcement across AWS Organizations. Single policy applies to all accounts and OUs. Available in all commercial and GovCloud regions.

Pricing

Bedrock Guardrails uses pay-per-use pricing with no upfront commitments or minimum fees. You are billed per 1,000 text units, where one text unit equals up to 1,000 characters. Content exceeding 1,000 characters is divided into multiple units (5,600 characters = 6 text units). Each filter type is priced independently, and charges apply only to the filters you enable. Pricing is the same for both Standard and Classic tiers.

Policy Type	Price per 1K Text Units	Notes
Content Filters (text)	$0.15	Hate, insults, sexual, violence, misconduct, prompt attack
Content Filters (image)	$0.00075 / image	Per-image charge, not text units
Denied Topics	$0.15	Semantic topic classification
Sensitive Info Filters	$0.10	PII detection and anonymization
Sensitive Info (regex)	Free	Custom regex pattern matching
Word Filters	Free	Exact-match keyword blocking
Contextual Grounding	$0.10	Units = source + query + response characters combined
Automated Reasoning	$0.17 / policy	Per-policy charge on each evaluation

All charges billed monthly. Pricing verified from the AWS Bedrock pricing page, May 2026. December 2024 price reduction: content filters down 80%, denied topics down 85%.

Standard vs Classic Tiers

In June 2025, AWS introduced Standard and Classic safeguard tiers for content filters and denied topics. Both tiers cost the same. The difference is in detection quality and capabilities:

Standard tier: 15%+ improvement in harmful content filtering recall, 7%+ gain in balanced accuracy, support for up to 60 languages, code-level protection (detects harmful content in comments, variable names, string literals), and more robust prompt attack defense that distinguishes jailbreaks from injection
Classic tier: Lower latency, simpler evaluations, suitable for straightforward content moderation without code or multilingual requirements

Standard tier requires opting in to cross-region inference. For most production use cases, AWS recommends Standard as the default choice.

85%

Maximum price reduction announced in December 2024 - denied topics dropped from $1.00 to $0.15 per 1K text units, content filters from $0.75 to $0.15

Source: AWS re:Invent 2024 pricing announcement

Cost Estimation Example

A customer service chatbot processing 100,000 interactions per month, with an average of 3,000 characters per interaction (3 text units), using content filters + denied topics + PII redaction:

Content filters: 300,000 text units × $0.15/1K = $45.00
Denied topics: 300,000 text units × $0.15/1K = $45.00
PII filters: 300,000 text units × $0.10/1K = $30.00
Total: $120.00/month for 100K interactions with three active policy types

Ready to configure these policies? Our step-by-step How to Use Bedrock Guardrails guide walks through console setup, CLI configuration, Terraform provisioning, and production deployment patterns.

Who Should Use Bedrock Guardrails

Who Gets the Most Value

🏥

Regulated Industries (Healthcare, Finance, Legal)

PII redaction prevents patient data, account numbers, and case details from leaking through model outputs. Automated Reasoning verifies that responses comply with documented policies and regulations. Contextual grounding ensures medical or financial advice is sourced from approved reference materials.

Best fit: Content + PII + Contextual Grounding + Automated Reasoning

🛡️

Enterprise Security & Compliance Teams

Cross-account safeguards (GA April 2026) let central security teams enforce a single guardrail policy across every AWS account in the organization. No per-account configuration needed. Audit-ready logs for every evaluation decision.

Best fit: Cross-account safeguards + all policy types

🔧

Platform Engineers Running Multi-Model Stacks

The ApplyGuardrail API works with any foundation model: Bedrock-hosted, self-hosted on SageMaker, or third-party (OpenAI, Gemini). One guardrail configuration protects your entire model portfolio. No vendor lock-in on the safety layer.

Best fit: ApplyGuardrail API + Content + Denied Topics

💬

Product Teams Shipping Customer-Facing AI

Denied topics keep the chatbot on-script. Word filters block competitor mentions and internal codenames. Content filters prevent harmful outputs that damage brand reputation. All configurable without model retraining.

Best fit: Denied Topics + Word Filters + Content Filters

Limitations

Bedrock Guardrails is the most comprehensive managed guardrail service available from a major cloud provider. That does not mean it is without gaps. Enterprise buyers should evaluate these limitations against their specific requirements.

Key Limitations

AWS Ecosystem Lock-In

While the ApplyGuardrail API works with any model, the guardrail configuration, management, and billing all live within AWS. Organizations running multi-cloud strategies (Azure + AWS + GCP) need separate safety solutions per cloud or accept a dependency on AWS for centralized guardrail management. Cross-account safeguards work only within AWS Organizations.

Regional Availability Gaps

Automated Reasoning is available only in US East (Ohio, N. Virginia), US West (Oregon), and three European regions (Frankfurt, Ireland, Paris) as of May 2026. Organizations with data residency requirements in Asia-Pacific, Middle East, or South America cannot use Automated Reasoning locally. Standard tier requires cross-region inference opt-in, which may conflict with data sovereignty policies.

No Filter Is Perfect

Content filter effectiveness depends on sensitivity threshold configuration and content type. No automated moderation system catches everything. For high-stakes applications (child safety, healthcare, crisis response), AWS recommends combining Guardrails with application-level validation and human review. Test each filter category at your chosen threshold against representative inputs before going to production.

Latency Overhead

Standard tier trades latency for accuracy. Each enabled policy adds evaluation time to every request and response. For real-time conversational applications with strict latency budgets, the cumulative overhead of multiple policy types may require careful testing. Classic tier is available as a lower-latency alternative at the cost of reduced detection accuracy.

Additional considerations: Automated Reasoning only verifies claims within the scope of the defined policy. Out-of-scope hallucinations receive no verification. No voice modality support as of May 2026 (text and image only). The service does not provide model-level explainability: it tells you whether a response violated a rule, but not why the model generated that response in the first place.

Frequently Asked Questions

Does Bedrock Guardrails work with non-AWS models like OpenAI or Gemini?

Yes. The ApplyGuardrail API evaluates any text or image content independently of the model that generated it. You send the content directly to the API, and it returns the evaluation result. This works with OpenAI, Google Gemini, self-hosted open-source models, or any other foundation model, regardless of where it runs.

What is the difference between contextual grounding and Automated Reasoning?

Contextual grounding uses probabilistic comparison to check if a model response is grounded in a reference source. It returns a confidence score. Automated Reasoning translates policies into formal logic and performs mathematical verification, returning a binary result (verified, contradicted, or indeterminate) with a provable proof. Contextual grounding is better for RAG applications. Automated Reasoning is better for policy compliance where you need auditable, deterministic verification.

How much does Bedrock Guardrails cost?

Pricing is pay-per-use per 1,000 text units (1 text unit = up to 1,000 characters). Content filters and denied topics cost $0.15/1K units. Sensitive information filters cost $0.10/1K units. Contextual grounding costs $0.10/1K units. Automated Reasoning costs $0.17/1K units per policy. Word filters and regex-based sensitive info filters are free. No minimum fees or upfront commitments.

What are cross-account safeguards?

Cross-account safeguards (GA April 2026) allow a central security team to define a guardrail policy in a management account and automatically enforce it across all member accounts and organizational units in AWS Organizations. This eliminates per-account configuration and ensures uniform safety baselines across the entire organization. Multiple guardrails can be layered: organization-wide, department-specific, and application-specific policies are all enforced together.

What prerequisites are needed to set up Bedrock Guardrails?

You need an AWS account with access to Amazon Bedrock in a supported region. No special hardware or software is required. Configuration is done through the Bedrock console, AWS CLI, CloudFormation, or Terraform (via the aws_bedrock_guardrail resource). For cross-account safeguards, you need AWS Organizations configured. For Automated Reasoning, your account must be in a supported region (US East, US West, or select European regions). SDK support is available in 12+ programming languages through the AWS SDK.

Video Resources

▶

Amazon Bedrock Guardrails: Full Overview

Search on YouTube

▶

Automated Reasoning: Hallucination Detection Explained

Search on YouTube

▶

Bedrock Guardrails: Setup Tutorial

Search on YouTube

Gallery

Contacts

What Is Amazon Bedrock Guardrails? AWS's AI Safety Layer Explained

What Is Amazon Bedrock Guardrails

How It Works: The Processing Pipeline

Input Flow

Output Flow

ApplyGuardrail API (Model-Independent)

Policy Types: The Six Safeguard Categories

1. Content Filters

2. Denied Topics

3. Word Filters

4. Sensitive Information Filters (PII)

5. Contextual Grounding Checks

6. Automated Reasoning Checks

Automated Reasoning: The 99% Accuracy Claim

How It Works in Practice

The 99% Figure

November 2025 Update: Natural Language Test Q&A Generation

Pricing

Standard vs Classic Tiers

Cost Estimation Example

Who Should Use Bedrock Guardrails

Limitations

Services

Learn

Company