What Is Amazon Bedrock Guardrails? AWS's AI Safety Layer Explained
Last verified: May 14, 2026 · Format: Breakdown
Your compliance team asks a question that should be simple: "How do we stop our customer-facing chatbot from discussing competitor products, leaking PII, or hallucinating a refund policy that doesn't exist?" Amazon Bedrock Guardrails is AWS's answer, and it operates as a standalone safety layer that sits between any foundation model and your end users.
The critical distinction: Guardrails is not a model. It is a configurable enforcement pipeline that evaluates every input and output against policies you define, acting as a safety layer between any foundation model and your users, then blocks, redacts, or flags content that violates those rules. It works with Bedrock-hosted models (Claude, Llama, Mistral), self-hosted models, and third-party APIs including OpenAI and Google Gemini through the ApplyGuardrail API. That model-agnostic architecture means you write your safety rules once and apply them everywhere.
What Is Amazon Bedrock Guardrails
Amazon Bedrock Guardrails is a managed AI safety service within Amazon Bedrock that provides configurable safeguards to detect and filter harmful content, redact sensitive information, prevent topic drift, and validate model responses against factual policies. It became generally available on April 23, 2024, alongside the broader Bedrock platform expansion.
The service exposes six safeguard policy types: content filters (text and image), denied topics, word filters, sensitive information filters (PII detection and regex), contextual grounding checks, and Automated Reasoning checks. Each policy operates independently. You enable only what you need, and you pay only for the filters you activate.
Guardrails integrates natively with Bedrock foundation models, Bedrock Agents, Bedrock Knowledge Bases, and Bedrock Flows. For models outside the Bedrock ecosystem, the ApplyGuardrail API provides the same policy enforcement without requiring model invocation through Bedrock at all. AWS describes this as "model-independent safety measures," and in practice it means a single guardrail configuration can protect a Claude deployment on Bedrock, a self-hosted Llama instance on SageMaker, and an OpenAI GPT call from your own application server.
For context on where Bedrock Guardrails fits in the broader AI tools landscape, visit the AI Tools Hub and the AWS sub-hub.
How It Works: The Processing Pipeline
Bedrock Guardrails operates as an inline evaluation layer with two enforcement points: one before the prompt reaches the model (input guardrail) and one after the model generates a response (output guardrail). Both run the same policy stack, but they apply independently.
Input Flow
When a user sends a prompt, Guardrails evaluates it against every enabled policy in parallel. If the prompt contains a denied topic, harmful content, blocked words, or sensitive information that should be redacted, the guardrail intervenes before the model ever sees the request. Blocked prompts return a configurable message (you write the response, not AWS). Redacted prompts continue to the model with PII replaced by placeholders.
Output Flow
After the model generates a response, the same policy stack evaluates the output. Content filters catch harmful generations, contextual grounding checks flag hallucinations against your reference source, and Automated Reasoning checks validate factual claims against formal logic policies. Violations trigger configurable actions: block the entire response, redact specific content, or flag for human review.
ApplyGuardrail API (Model-Independent)
The ApplyGuardrail API decouples the evaluation engine from model invocation entirely. You send text (or images) directly to the API, and it returns the evaluation result without calling any foundation model. This enables three critical patterns: applying guardrails to third-party model outputs (OpenAI, Google Gemini), using guardrails with self-hosted models on SageMaker or EC2, and running pre-flight validation on prompts before they enter any model pipeline. For teams running autonomous agents in AWS, see how the AWS DevOps Agent pairs with Guardrails for safe infrastructure automation.
Policy Types: The Six Safeguard Categories
Each policy type addresses a distinct risk category. You configure them independently through the Bedrock console, AWS CLI, CloudFormation, or Terraform. Here is what each one does and when to use it.
1. Content Filters
Detect and filter harmful text or image content across six categories: Hate, Insults, Sexual, Violence, Misconduct, and Prompt Attack (jailbreak and prompt injection attempts). You set the sensitivity threshold for each category independently on a scale from low to high. Image content filtering evaluates uploaded images against the same categories. The Standard tier extends detection to code elements, identifying harmful content embedded in comments, variable names, function names, and string literals.
2. Denied Topics
Define topics your application must never discuss. You describe each denied topic in natural language (for example, "investment advice" or "competitor product recommendations"), and the guardrail classifies incoming prompts and model responses against those definitions. This is topic-level control, not keyword matching: the model understands semantic intent, so a user rephrasing "stock tips" as "what should I invest in" still triggers the filter.
3. Word Filters
Exact-match keyword blocking for profanity, competitor names, internal project codenames, or any custom word list you define. This is the simplest policy type: if the exact word or phrase appears in the input or output, it triggers the configured action. Word filters are free; no per-unit charge applies.
4. Sensitive Information Filters (PII)
Detect and redact personally identifiable information including names, email addresses, phone numbers, Social Security numbers, credit card numbers, and IP addresses. You choose between two actions per PII type: block (reject the entire request) or anonymize (replace with a placeholder like {EMAIL}). Custom regex patterns are also supported for organization-specific identifiers like employee IDs or account numbers. Regex-based filters are free.
5. Contextual Grounding Checks
The hallucination detection layer for RAG (Retrieval-Augmented Generation) applications. Contextual grounding evaluates whether the model's response is grounded in the provided reference source and relevant to the user's query. You set a grounding threshold (0 to 1) and a relevance threshold independently. Responses that fall below either threshold are blocked. This is critical for knowledge base applications where factual accuracy is non-negotiable: legal research, medical information, financial advice, and compliance documentation.
6. Automated Reasoning Checks
The mathematically verifiable hallucination prevention system. Unlike contextual grounding (which uses probabilistic comparison), Automated Reasoning translates your policies into formal logic rules and verifies model outputs against them with provable correctness. Covered in detail in the next section.
Pricing note: Each policy type is billed independently. You pay only for the filters you enable. Word filters and regex-based sensitive information filters are free. Content filters and denied topics cost $0.15/1K text units. Sensitive information filters cost $0.10/1K text units. Contextual grounding costs $0.10/1K text units. Automated Reasoning costs $0.17/1K text units per policy. See the Pricing section for the full breakdown.
Automated Reasoning: The 99% Accuracy Claim
Automated Reasoning checks launched in preview at re:Invent 2024 (December 2024) and reached general availability in August 2025 across US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland, Paris). AWS claims it is the first and only major cloud provider to integrate automated reasoning into its generative AI guardrails.
The mechanism works differently from every other hallucination detection approach. Instead of comparing model output to a reference document probabilistically, Automated Reasoning translates your policy documents into formal logical rules. When a model generates a response, the system performs mathematical verification against those rules and produces a provable result: the statement is either verified, contradicted, or indeterminate. There is no confidence score and no probabilistic threshold. The answer is binary.
How It Works in Practice
You provide a policy document in natural language (for example, your HR leave policy, refund terms, or compliance requirements). Bedrock Guardrails automatically generates an Automated Reasoning Policy from that document. When a model response touches topics covered by the policy, the system verifies each claim against the formal rules. Verified claims pass. Contradicted claims are flagged or blocked. Indeterminate claims (not covered by the policy) are explicitly marked as such.
The 99% Figure
AWS states that Automated Reasoning checks "deliver up to 99% accuracy at detecting correct responses from LLMs." This refers specifically to the verification accuracy of claims that fall within the scope of a defined policy. It does not mean the model's outputs are 99% accurate in general. The distinction matters: Automated Reasoning can only verify what the policy covers. Claims outside the policy scope receive no verification.
November 2025 Update: Natural Language Test Q&A Generation
In November 2025, AWS added automatic test Q&A generation for Automated Reasoning policies. The system now generates test question-and-answer pairs from each policy document, allowing you to validate and refine your policy before deploying it to production. This reduces the iteration cycle from "write policy, deploy, discover edge cases in production" to "write policy, review generated tests, fix gaps, deploy."
Pricing
Bedrock Guardrails uses pay-per-use pricing with no upfront commitments or minimum fees. You are billed per 1,000 text units, where one text unit equals up to 1,000 characters. Content exceeding 1,000 characters is divided into multiple units (5,600 characters = 6 text units). Each filter type is priced independently, and charges apply only to the filters you enable. Pricing is the same for both Standard and Classic tiers.
| Policy Type | Price per 1K Text Units | Notes |
|---|---|---|
| Content Filters (text) | $0.15 | Hate, insults, sexual, violence, misconduct, prompt attack |
| Content Filters (image) | $0.00075 / image | Per-image charge, not text units |
| Denied Topics | $0.15 | Semantic topic classification |
| Sensitive Info Filters | $0.10 | PII detection and anonymization |
| Sensitive Info (regex) | Free | Custom regex pattern matching |
| Word Filters | Free | Exact-match keyword blocking |
| Contextual Grounding | $0.10 | Units = source + query + response characters combined |
| Automated Reasoning | $0.17 / policy | Per-policy charge on each evaluation |
All charges billed monthly. Pricing verified from the AWS Bedrock pricing page, May 2026. December 2024 price reduction: content filters down 80%, denied topics down 85%.
Standard vs Classic Tiers
In June 2025, AWS introduced Standard and Classic safeguard tiers for content filters and denied topics. Both tiers cost the same. The difference is in detection quality and capabilities:
- Standard tier: 15%+ improvement in harmful content filtering recall, 7%+ gain in balanced accuracy, support for up to 60 languages, code-level protection (detects harmful content in comments, variable names, string literals), and more robust prompt attack defense that distinguishes jailbreaks from injection
- Classic tier: Lower latency, simpler evaluations, suitable for straightforward content moderation without code or multilingual requirements
Standard tier requires opting in to cross-region inference. For most production use cases, AWS recommends Standard as the default choice.
Cost Estimation Example
A customer service chatbot processing 100,000 interactions per month, with an average of 3,000 characters per interaction (3 text units), using content filters + denied topics + PII redaction:
- Content filters: 300,000 text units × $0.15/1K = $45.00
- Denied topics: 300,000 text units × $0.15/1K = $45.00
- PII filters: 300,000 text units × $0.10/1K = $30.00
- Total: $120.00/month for 100K interactions with three active policy types
Ready to configure these policies? Our step-by-step How to Use Bedrock Guardrails guide walks through console setup, CLI configuration, Terraform provisioning, and production deployment patterns.
Who Should Use Bedrock Guardrails
PII redaction prevents patient data, account numbers, and case details from leaking through model outputs. Automated Reasoning verifies that responses comply with documented policies and regulations. Contextual grounding ensures medical or financial advice is sourced from approved reference materials.
Best fit: Content + PII + Contextual Grounding + Automated ReasoningCross-account safeguards (GA April 2026) let central security teams enforce a single guardrail policy across every AWS account in the organization. No per-account configuration needed. Audit-ready logs for every evaluation decision.
Best fit: Cross-account safeguards + all policy typesThe ApplyGuardrail API works with any foundation model: Bedrock-hosted, self-hosted on SageMaker, or third-party (OpenAI, Gemini). One guardrail configuration protects your entire model portfolio. No vendor lock-in on the safety layer.
Best fit: ApplyGuardrail API + Content + Denied TopicsDenied topics keep the chatbot on-script. Word filters block competitor mentions and internal codenames. Content filters prevent harmful outputs that damage brand reputation. All configurable without model retraining.
Best fit: Denied Topics + Word Filters + Content FiltersLimitations
Bedrock Guardrails is the most comprehensive managed guardrail service available from a major cloud provider. That does not mean it is without gaps. Enterprise buyers should evaluate these limitations against their specific requirements.
While the ApplyGuardrail API works with any model, the guardrail configuration, management, and billing all live within AWS. Organizations running multi-cloud strategies (Azure + AWS + GCP) need separate safety solutions per cloud or accept a dependency on AWS for centralized guardrail management. Cross-account safeguards work only within AWS Organizations.
Automated Reasoning is available only in US East (Ohio, N. Virginia), US West (Oregon), and three European regions (Frankfurt, Ireland, Paris) as of May 2026. Organizations with data residency requirements in Asia-Pacific, Middle East, or South America cannot use Automated Reasoning locally. Standard tier requires cross-region inference opt-in, which may conflict with data sovereignty policies.
Content filter effectiveness depends on sensitivity threshold configuration and content type. No automated moderation system catches everything. For high-stakes applications (child safety, healthcare, crisis response), AWS recommends combining Guardrails with application-level validation and human review. Test each filter category at your chosen threshold against representative inputs before going to production.
Standard tier trades latency for accuracy. Each enabled policy adds evaluation time to every request and response. For real-time conversational applications with strict latency budgets, the cumulative overhead of multiple policy types may require careful testing. Classic tier is available as a lower-latency alternative at the cost of reduced detection accuracy.
Additional considerations: Automated Reasoning only verifies claims within the scope of the defined policy. Out-of-scope hallucinations receive no verification. No voice modality support as of May 2026 (text and image only). The service does not provide model-level explainability: it tells you whether a response violated a rule, but not why the model generated that response in the first place.