Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Amazon Web Services

How to Use Amazon Bedrock Guardrails: Complete Setup Guide (2026)

Last verified: May 14, 2026  ·  Format: Guide  ·  Est. time: 20-25 min

6
Configurable safeguard policy types across text, image, and code
Source: AWS documentation
$0.15
Per 1,000 text units for content filters (80% price reduction, Dec 2024)
Source: AWS Bedrock pricing page
99%
Automated Reasoning verification accuracy within defined policy scope
Source: AWS blog, Aug 2025
6
Content filter categories with configurable thresholds (hate, insults, sexual, violence, misconduct, prompt attack)
Source: AWS Bedrock Guardrails documentation

Your LLM application is live, but what stops it from generating harmful content, leaking customer PII in a response, or producing a hallucination like a refund policy that does not exist? Amazon Bedrock Guardrails is AWS's answer: a configurable enforcement pipeline that sits between any foundation model and your end users, evaluating every input and output against policies you define.

This guide walks through every step of setting up Guardrails from scratch. You will create content filters, configure denied topics, set up PII detection, add word and regex filters, enable contextual grounding, and use the ApplyGuardrail API to protect models outside the Bedrock ecosystem. Each step includes verification instructions so you can confirm the guardrail works before moving to production. For a deeper understanding of what Guardrails is and how it fits the broader AI safety landscape, see What Is Amazon Bedrock Guardrails?

What You Need Before Starting

Bedrock Guardrails is a managed service within Amazon Bedrock. If you are new to the platform, our What Is Amazon Bedrock? breakdown covers the full model catalog and architecture. You do not need special hardware, but you do need a properly configured AWS account. Walk through this checklist before proceeding.

Prerequisites Checklist
An active AWS account with billing enabled
Amazon Bedrock enabled in a supported region (US East N. Virginia, US West Oregon, Europe Frankfurt/Ireland/Paris recommended)
At least one foundation model enabled in Bedrock (Claude, Llama, Mistral, or another supported model)
IAM permissions for Bedrock Guardrails: bedrock:CreateGuardrail, bedrock:UpdateGuardrail, bedrock:GetGuardrail, bedrock:ApplyGuardrail
Optional: An existing Bedrock application, Agent, or Knowledge Base to attach the guardrail to
0 of 5 complete
Guide Progress
0 of 8 steps complete
  • Step 1: Navigate the Guardrails Console
  • Step 2: Create Your First Guardrail
  • Step 3: Configure Denied Topics
  • Step 4: Add PII Filters
  • Step 5: Set Up Word and Regex Filters
  • Step 6: Enable Contextual Grounding
  • Step 7: Use the ApplyGuardrail API
  • Step 8: Test and Monitor
99%
Verification accuracy for Automated Reasoning checks - mathematically provable hallucination detection within defined policy scope, available since August 2025
Source: AWS blog, "Minimize AI hallucinations," Aug 2025

Step 1: Navigating the Guardrails Console

Guardrails are configured through the Amazon Bedrock console, AWS CLI, CloudFormation, or Terraform (via the aws_bedrock_guardrail resource). This guide uses the console for visual clarity, but every action has a CLI equivalent.

  1. Sign in to the AWS Management Console and navigate to Amazon Bedrock.
  2. In the left navigation pane, select Guardrails under the Safeguards section.
  3. You will see a list of existing guardrails (empty if this is your first setup). The Create guardrail button is in the top-right corner.
  4. Confirm you are in a supported region. Automated Reasoning requires US East (Ohio or N. Virginia), US West (Oregon), or select European regions (Frankfurt, Ireland, Paris).

Verification: You should see the Guardrails management page with a "Create guardrail" button. If you see an access denied error, check that your IAM user or role has the bedrock:*Guardrail* permissions. If Guardrails does not appear in the navigation, confirm Bedrock is enabled in your selected region.

Step 2: Creating Your First Guardrail (Content Filters)

Content filters are the foundation of every guardrail. They detect and block harmful content across six categories: Hate, Insults, Sexual, Violence, Misconduct, and Prompt Attack (jailbreak and injection attempts). You set the sensitivity threshold for each category independently.

  1. Click Create guardrail.
  2. Enter a descriptive name (for example, customer-service-guardrail) and an optional description.
  3. Under Content filters, you will see the six categories. For each category, set the filter strength:
    • None: Disabled for this category
    • Low: Blocks only the most explicit content
    • Medium: Balanced detection (recommended starting point)
    • High: Aggressive filtering, may catch borderline content
  4. Set Prompt Attack to High. This protects against jailbreak and prompt injection attempts. The other categories depend on your use case, but Medium is a reasonable default for customer-facing applications.
  5. Choose your safeguard tier:
    • Standard: 15%+ recall improvement, 60-language support, code-level protection. Recommended for most production deployments. Requires cross-region inference opt-in.
    • Classic: Lower latency, simpler evaluations. Suitable for straightforward moderation without multilingual or code requirements.
  6. Set a blocked messaging response. This is the message users see when their input or the model's output is blocked. Write something clear: "I'm unable to respond to that request. Please rephrase your question."
  7. Click Next (do not save yet; you will add more policies in the following steps).

Verification: After configuring content filters, the console should show green checkmarks next to each enabled category with the selected threshold level. The pricing preview in the console sidebar should show $0.15 per 1,000 text units for content filters.

Common Configuration Mistakes
Setting All Thresholds to High

High sensitivity catches borderline content and increases false positives. Start at Medium for most categories and only raise to High for Prompt Attack. Tune thresholds based on real traffic data, not assumptions.

Forgetting the Blocked Message

If you do not set a custom blocked messaging response, users see a generic AWS error. Write a clear, brand-appropriate message that tells users how to rephrase their request.

Skipping Standard Tier Evaluation

Standard tier provides 15%+ recall improvement and 60-language support at the same price as Classic. Unless your application has strict latency budgets under 100ms, test Standard first before defaulting to Classic.

Step 3: Configuring Denied Topics

Denied topics prevent your application from discussing subjects that are off-limits. Unlike word filters, denied topics use semantic understanding: the guardrail recognizes the intent behind a question, not just specific keywords.

  1. In the guardrail configuration wizard, navigate to the Denied topics section.
  2. Click Add denied topic.
  3. Give the topic a name (for example, "Investment Advice").
  4. Write a natural language description of the topic: "Any recommendations, suggestions, or guidance about buying, selling, or holding financial securities, stocks, bonds, cryptocurrency, or other investment instruments."
  5. Add example prompts that should be blocked:
    • "Should I buy Tesla stock?"
    • "What is the best cryptocurrency to invest in right now?"
    • "How should I allocate my 401(k)?"
  6. Repeat for each topic you want to deny. Common examples for enterprise applications:
    • Competitor product recommendations
    • Medical diagnosis or treatment advice
    • Legal counsel or interpretation
    • Internal company financials

Verification: Each denied topic should appear in the configuration list with its name and description. After the guardrail is created, test by sending a prompt that matches a denied topic. The response should return your configured blocked message, not the model's answer. Try rephrasing the question to verify semantic detection works beyond exact keyword matching.

Step 4: Adding Sensitive Information Filters (PII Detection)

Sensitive information filters detect and handle personally identifiable information. You control the action per PII type: block the entire request or anonymize the PII by replacing it with a placeholder.

  1. Navigate to the Sensitive information filters section in the wizard.
  2. Enable the PII types relevant to your application. Available types include:
    • Names, email addresses, phone numbers
    • Social Security numbers, credit card numbers
    • IP addresses, AWS account IDs
    • Driver's license numbers, passport numbers
  3. For each PII type, choose the action:
    • Block: Reject the entire request or response
    • Anonymize: Replace the detected PII with a placeholder (for example, {EMAIL}, {PHONE}, {SSN})
  4. For most applications, use Anonymize for common identifiers (email, phone, name) and Block for high-sensitivity data (SSN, credit card numbers).
  5. Add custom regex patterns for organization-specific identifiers. For example, if your employee IDs follow the pattern EMP- followed by six digits:
    EMP-\d{6}
    Regex-based filters are free; no per-unit charge applies.

Verification: After the guardrail is active, send a test prompt containing a fake email address (for example, "Send the report to test@example.com"). If anonymization is enabled, the model should receive the prompt with {EMAIL} replacing the address. If blocking is enabled, the request should be rejected entirely.

$0.10
Per 1K text units for managed PII detection covering names, emails, phone numbers, SSNs, credit cards, and IP addresses - with custom regex filters available at no additional cost
Source: AWS Bedrock Guardrails pricing page, May 2026

Step 5: Setting Up Word Filters and Regex

Word filters are the simplest policy type: exact-match keyword blocking. Use them for profanity, competitor names, internal project codenames, or any term that should never appear in inputs or outputs. Word filters are free.

  1. Navigate to the Word filters section.
  2. Enable the managed word list to block common profanity. AWS maintains this list.
  3. Add custom words and phrases. Enter each term on a new line or upload a CSV file. Examples:
    • Competitor product names: "CompetitorX", "RivalProduct"
    • Internal codenames: "Project Phoenix", "Alpha Build"
    • Sensitive terms specific to your industry
  4. Note that word filters are case-insensitive and match exact strings. The phrase "CompetitorX" will also catch "competitorx" but will not catch "Competitor X" (with a space). Add variations as needed.

Verification: After saving, send a test prompt containing one of your blocked words. The guardrail should trigger the blocked response. Then test with a slight variation (different casing) to confirm case-insensitive matching works. Test with a spaced variation to confirm it does not match (this is expected behavior for exact-match filters).

Step 6: Enabling Contextual Grounding

Contextual grounding checks evaluate whether the model's response is grounded in the provided reference source and relevant to the user's query. This is the hallucination detection layer for RAG (Retrieval-Augmented Generation) applications. If your application retrieves documents and uses them to ground model responses, this filter catches cases where the model drifts from the source material.

  1. Navigate to the Contextual grounding section.
  2. Set the Grounding threshold (0.0 to 1.0). This measures how closely the response aligns with the reference source. A threshold of 0.7 is a reasonable starting point. Higher values are stricter.
  3. Set the Relevance threshold (0.0 to 1.0). This measures whether the response addresses the user's actual question. A threshold of 0.7 is a reasonable default.
  4. Responses that fall below either threshold are blocked.
  5. Note: Contextual grounding costs $0.10 per 1,000 text units. The text unit count combines the source document, query, and response characters.

Verification: Test with a Knowledge Base query where the answer is clearly supported by the retrieved documents. The response should pass. Then test with a follow-up question that the documents do not cover. If the model fabricates an answer, the grounding check should block it and return your configured blocked message.

0.7
Recommended starting threshold for both grounding and relevance checks - balances hallucination detection with false positive rates for most RAG applications
Source: AWS Bedrock Guardrails documentation

Step 7: Using the ApplyGuardrail API (Model-Independent)

The ApplyGuardrail API decouples the evaluation engine from model invocation. You send text directly to the API, and it returns the evaluation result without calling any foundation model. This enables three critical patterns: applying guardrails to third-party model outputs (OpenAI, Google Gemini), using guardrails with self-hosted models on SageMaker or EC2, and running pre-flight validation on prompts before they enter any model pipeline. For teams using autonomous agents in production, pairing Guardrails with agentic workflows like the AWS DevOps Agent adds an additional enforcement boundary.

Basic API Call (Python Boto3)

import boto3

client = boto3.client('bedrock-runtime', region_name='us-east-1')

response = client.apply_guardrail(
    guardrailIdentifier='your-guardrail-id',
    guardrailVersion='DRAFT',   # or a specific version number
    source='OUTPUT',            # 'INPUT' for prompt checks, 'OUTPUT' for response checks
    content=[
        {
            'text': {
                'text': 'The model response text you want to evaluate goes here.'
            }
        }
    ]
)

# Check the result
action = response['action']   # 'NONE' (passed) or 'GUARDRAIL_INTERVENED' (blocked)
print(f"Action: {action}")

if action == 'GUARDRAIL_INTERVENED':
    for output in response.get('outputs', []):
        print(f"Blocked message: {output['text']}")

    for assessment in response.get('assessments', []):
        print(f"Assessment details: {assessment}")

Integration Pattern for Third-Party Models

To protect an OpenAI or Gemini deployment with Bedrock Guardrails:

  1. Send the user's prompt to the ApplyGuardrail API with source='INPUT'.
  2. If the action is NONE, forward the prompt to your third-party model.
  3. When the model responds, send the response to the ApplyGuardrail API with source='OUTPUT'.
  4. If the action is NONE, return the response to the user. If GUARDRAIL_INTERVENED, return the blocked message.

Verification: Run the API call with a benign text string. The response should return action: NONE. Then test with text that violates one of your configured policies (for example, a string containing a blocked word). The response should return action: GUARDRAIL_INTERVENED with the assessment showing which policy was triggered.

Step 8: Testing and Monitoring Your Guardrails

A guardrail that is not tested is a guardrail you cannot trust. Bedrock provides a built-in test interface in the console, and you should supplement it with programmatic tests before production deployment.

Console Testing

  1. In the Guardrails console, select your guardrail and click Test.
  2. Choose a foundation model to test against (or use the ApplyGuardrail API mode for model-independent testing).
  3. Enter test prompts that should pass and prompts that should be blocked. Document the results.
  4. For Automated Reasoning policies (if enabled), AWS now generates test Q&A pairs automatically from your policy documents. Review these generated tests to identify edge cases your policy may not cover.

Monitoring in Production

  • CloudWatch Metrics: Bedrock publishes guardrail evaluation metrics to CloudWatch. Monitor GuardrailInvocations, GuardrailIntervened, and GuardrailPassed to track block rates over time.
  • CloudWatch Logs: Enable detailed logging to capture which policy type triggered each intervention. This is essential for tuning thresholds.
  • CloudTrail: All guardrail API calls are logged in CloudTrail for audit and compliance purposes.

Pricing Awareness

Each enabled filter charges independently per evaluation. For a customer service chatbot processing 100,000 interactions per month (average 3,000 characters per interaction = 3 text units each):

  • Content filters: 300,000 text units × $0.15/1K = $45.00
  • Denied topics: 300,000 text units × $0.15/1K = $45.00
  • PII filters: 300,000 text units × $0.10/1K = $30.00
  • Total: $120.00/month for three active policy types

Word filters and regex-based sensitive information filters are free. Contextual grounding costs $0.10/1K text units. Automated Reasoning costs $0.17/1K text units per policy.

Verification: After deploying to production, confirm CloudWatch metrics are flowing. The GuardrailIntervened metric should show non-zero counts if your application handles real user traffic. If the metric is zero after significant traffic, your guardrail may not be properly attached to the model invocation. Check the Bedrock InvokeModel call to confirm the guardrail ID and version are included.

Common Failure Patterns
Guardrail ID Not Passed in API Call

The most frequent production issue: the guardrail exists but the InvokeModel or Converse API call does not include the guardrailIdentifier and guardrailVersion parameters. The model responds unfiltered with no error, giving a false sense of safety.

Wrong Region for Automated Reasoning

Automated Reasoning is only available in US East (Ohio, N. Virginia), US West (Oregon), and select European regions (Frankfurt, Ireland, Paris). Deploying in an unsupported region silently skips Automated Reasoning checks without error.

CloudWatch Metrics Show Zero Interventions

If GuardrailIntervened reads zero after significant traffic, the guardrail is likely not attached to the model invocation. Verify the guardrail ID and version in your API call. Also check that the guardrail version is not set to DRAFT in production.

Missing IAM Permissions

The bedrock:ApplyGuardrail permission is separate from bedrock:InvokeModel. If guardrail evaluation fails silently, confirm your IAM role includes both permissions. Cross-account safeguards require additional Organizations-level permissions.


Troubleshooting and FAQ

Common Questions
Check three things: First, confirm the guardrail ID and version are correctly passed in your InvokeModel or ApplyGuardrail API call. Second, verify the specific policy type is enabled (not set to "None"). Third, check the filter threshold. If content filters are set to "Low," borderline content passes through. Increase to "Medium" or "High" and retest.
Yes. The ApplyGuardrail API evaluates any text or image content independently of the model that generated it. Send the content directly to the API with the source parameter set to INPUT or OUTPUT. This works with OpenAI, Google Gemini, self-hosted open-source models on SageMaker or EC2, or any other foundation model regardless of where it runs.
Contextual grounding uses probabilistic comparison to check if a model response is grounded in a reference source. It returns a confidence score against a threshold you set. Automated Reasoning translates your policies into formal logic and performs mathematical verification, returning a binary result: verified, contradicted, or indeterminate. Use contextual grounding for RAG applications. Use Automated Reasoning for policy compliance where you need auditable, deterministic verification.
Cross-account safeguards reached general availability in April 2026. Define a guardrail policy in your management account and automatically enforce it across all member accounts and organizational units in AWS Organizations. This eliminates per-account configuration and ensures uniform safety baselines. Multiple guardrails can be layered: organization-wide, department-specific, and application-specific policies are all enforced together.
Standard tier for most production deployments. It provides 15%+ improvement in harmful content filtering recall, 7%+ gain in balanced accuracy, support for up to 60 languages, and code-level protection. Classic tier trades accuracy for lower latency and is suitable for simple content moderation without multilingual or code requirements. Both tiers cost the same ($0.15/1K text units for content filters).
No automated content moderation system catches everything. Effectiveness depends on your sensitivity threshold settings and the type of content. For high-stakes applications (child safety, healthcare, crisis response), AWS recommends combining Guardrails with application-level validation and human review. Test each filter category at your chosen threshold against representative inputs before deploying to production.

Next Step

Start with a single guardrail using content filters and denied topics on a non-production workload. Monitor the CloudWatch metrics for one week to understand your block rate and false positive rate. Then add PII filters and contextual grounding incrementally. This phased approach lets you tune thresholds based on real data before committing to a full production rollout. For organizations with multiple AWS accounts, evaluate cross-account safeguards to centralize policy enforcement across your entire estate.