Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Amazon Web Services

How to Use AWS DevOps Agent: Complete Setup Guide (2026)

Last verified: May 14, 2026  ·  Format: Guide  ·  Est. time: 25-35 min

It is 3 AM and your pager fires. A payment service is returning 500 errors across three availability zones. By the time you open your laptop, the AWS DevOps Agent has already correlated CloudWatch metrics with Datadog traces, identified that a recent deployment changed a database connection pool setting, built a topology graph of every affected downstream service, and drafted a mitigation plan with rollback steps waiting for your approval. That investigation took it four minutes. Manually, you would have needed forty.

This guide walks you through setting up AWS DevOps Agent from scratch: creating your first Agent Space, configuring IAM permissions, connecting your observability and ticketing tools, running your first investigation, and setting up proactive evaluations that catch problems before they become outages. Every step includes a verification checkpoint so you know the configuration is working before moving on. The entire setup takes about 90 minutes for a single application, and the 2-month free trial means you can validate the tool against your own environment before spending a dollar.

75%
MTTR reduction reported by early adopters (mean time to resolution)
Source: AWS / customer case studies
$0.498
Per agent-minute ($0.0083/second), uniform across all operational modes
Source: AWS pricing page
6
AWS regions at general availability (US East/West, EU, APAC)
Source: AWS What's New, March 2026
94%
Root cause accuracy rate across autonomous investigations
Source: AWS DevOps Agent documentation

What You Need Before Starting

AWS DevOps Agent is an autonomous SRE agent built on Amazon Bedrock AgentCore (official docs), generally available since March 31, 2026. It operates in a dedicated console experience separate from Amazon Q Developer. Before you begin setup, confirm you have the following items in place. Missing any of these will block you at a specific step, and backtracking wastes time.

Prerequisites Checklist
An active AWS account with billing enabled (the 2-month free trial starts automatically)
IAM permissions to create roles and policies (or access to an administrator who can)
At least one observability tool configured: CloudWatch (native), Datadog, Dynatrace, Splunk, New Relic, or Grafana
At least one application deployed on AWS generating metrics and logs (the agent needs telemetry to analyze)
Workloads in a supported region: US East (N. Virginia), US West (Oregon), EU (Frankfurt, Ireland), or APAC (Sydney, Tokyo)
Optional: Slack workspace or ServiceNow instance for notification integration
0 of 6 complete
Guide Progress
0 of 8 steps complete
  • Step 1: Create Your Agent Space
  • Step 2: Configure IAM Roles
  • Step 3: Connect Observability
  • Step 4: Application Topology
  • Step 5: Incident Response
  • Step 6: Collaboration Tools
  • Step 7: On-Demand SRE Tasks
  • Step 8: Set Up Evaluations
55 Hours/Month
Free trial includes 20 hours of investigations, 15 hours of evaluations, and 20 hours of on-demand SRE tasks per month for 2 months across 10 Agent Spaces
Source: AWS DevOps Agent pricing page

Step 1: Creating Your First Agent Space

An Agent Space is the foundational unit of AWS DevOps Agent. It defines the scope of what the agent can see and act on: which AWS accounts, which third-party integrations, which operators have access, and what operational boundaries apply. You can run up to 100 Agent Spaces per account per region (adjustable via service quotas). Each space is an isolated boundary, meaning the agent working on your payment service cannot access the topology of your marketing platform unless you explicitly configure cross-space visibility.

  1. Open the AWS Management Console and navigate to the DevOps Agent service. It has its own console entry separate from Amazon Q Developer.
  2. Click Create Agent Space.
  3. Enter a descriptive name that maps to a logical application or service boundary. Examples: payments-prod, checkout-service-us-east, platform-staging.
  4. Select the AWS accounts you want this Agent Space to monitor. You can associate multiple accounts with a single space for cross-account visibility.
  5. Choose the supported region closest to your workloads: US East (N. Virginia), US West (Oregon), Europe (Frankfurt or Ireland), or Asia Pacific (Sydney or Tokyo).
  6. Click Create. The Agent Space initializes within 30-60 seconds.

Verification: After creation, the Agent Space should appear in your console list with a status of "Active." Click into it and confirm the associated AWS accounts are listed correctly. If the status shows "Initializing" for more than 2 minutes, check your region selection and IAM permissions.

Step 2: Configuring IAM Roles and Permissions

AWS DevOps Agent needs IAM roles with specific permissions to read telemetry, access CloudWatch logs, query your infrastructure, and interact with connected tools. The principle of least privilege applies here: grant only what the agent needs, nothing more.

  1. Navigate to IAM in the AWS Management Console.
  2. Create a new IAM role for the DevOps Agent service. Select AWS Service as the trusted entity type and choose DevOps Agent from the service list.
  3. Attach the AWS-managed policy AWSDevOpsAgentServiceRolePolicy. This grants the baseline permissions the agent needs for CloudWatch metrics, logs, X-Ray traces, and CloudFormation stack discovery.
  4. If you are connecting multiple AWS accounts, create a cross-account IAM role in each target account that trusts the agent's primary account. The agent uses sts:AssumeRole to access cross-account resources.
  5. For operator access, configure IAM Identity Center (SSO) or individual IAM users with permissions to the DevOps Agent Web App. Operators interact with the agent through this separate web application, not the AWS Management Console.
  6. Review the trust policy to confirm only the DevOps Agent service principal can assume the role. Do not use wildcard principals.

Verification: In the Agent Space settings, navigate to the Permissions tab. The console should show a green checkmark next to each required permission. If any show a red X, the IAM role is missing that specific policy. Use the console's built-in permission analyzer to identify the exact missing action.

Step 3: Connecting Observability Tools

The agent needs access to your monitoring and logging data to perform investigations. AWS DevOps Agent natively supports CloudWatch (built-in), Datadog, Dynatrace, New Relic, Splunk, and Grafana with Prometheus. Any tool without a native integration can be connected via MCP (Model Context Protocol) servers.

CloudWatch (Native, No Extra Setup)

If your workloads already emit metrics and logs to CloudWatch, the agent picks them up automatically through the IAM role you configured in Step 2. No additional integration step is required. CloudWatch is the default telemetry source.

Third-Party Tools (Datadog, Dynatrace, Splunk, New Relic, Grafana)

  1. In the Agent Space settings, navigate to Integrations.
  2. Click Add Integration and select your observability provider.
  3. Enter the required credentials: API key and application key for Datadog, API token for Dynatrace, HEC token for Splunk, or API key for New Relic.
  4. Configure the data scope. Limit the integration to specific services or environments rather than granting access to your entire observability tenant.
  5. Test the connection. The console runs a connectivity check that confirms the agent can query metrics and logs from the connected tool.

MCP Servers (Custom Tools)

For tools without native integration, connect via MCP (Model Context Protocol) servers using Streamable HTTP transport. The agent supports OAuth 2.0, API Key, and SigV4 authentication methods. MCP servers are registered at the AWS account level and shared across all Agent Spaces in that account. Private and on-premises MCP servers connect through VPC. AWS Labs maintains 56+ open-source MCP servers on GitHub for common operational tools.

Verification: After connecting each tool, go to the Agent Space dashboard and look for the integration status indicator. Each connected source should show "Connected" with a recent timestamp. Ask the agent an on-demand question like "Show me the top 5 metrics from [tool name] for the last hour" to confirm data is flowing.

Common Setup Mistakes (Steps 2-3)
Wildcard IAM Principals

Never use wildcard principals in the DevOps Agent trust policy. Only the DevOps Agent service principal should be allowed to assume the role. Overly permissive trust policies create a privilege escalation path.

Missing Cross-Account Roles

If you are monitoring multiple AWS accounts, each target account needs its own IAM role that trusts the agent's primary account via sts:AssumeRole. A missing cross-account role causes silent gaps in visibility.

Observability Scope Too Broad

When connecting third-party tools like Datadog or Splunk, limit the integration to specific services or environments. Granting full-tenant access exposes telemetry the agent does not need and increases your attack surface.

Step 4: Setting Up Application Topology

The topology engine is what makes AWS DevOps Agent more than a log search tool. It builds a continuously updated graph of your services, dependencies, and communication patterns. When something breaks, the agent already knows what depends on what, which is how it can identify root causes in minutes instead of hours.

  1. The agent begins building topology automatically once your Agent Space has connected AWS accounts and observability tools. No manual setup is required for the initial discovery.
  2. The agent discovers services from multiple sources: CloudFormation stacks, resource tags, CI/CD pipeline configurations, and actual traffic patterns observed through background learning.
  3. To accelerate topology accuracy, tag your AWS resources consistently. The agent uses tags like service, environment, team, and application to group related resources.
  4. Connect your source control (GitHub, GitLab, or Azure DevOps) through the Integrations panel. This gives the agent visibility into deployment events, which it uses to correlate incidents with recent code changes.
  5. Review the topology graph in the DevOps Agent Web App. You can view services, dependencies, and traffic flow visually. The agent updates this graph continuously as your environment changes.

Verification: Open the Topology view in the DevOps Agent Web App. You should see your services and their dependencies mapped out. If the graph appears sparse, give the agent 24-48 hours of background learning time. The topology engine improves its accuracy as it observes more traffic patterns and incidents. If services are missing, check that the corresponding AWS accounts are associated with the Agent Space and that resource tags are consistent.

77%
MTTR improvement at Western Governors University: from 120 minutes down to 28 minutes per incident after deploying AWS DevOps Agent. Source: AWS customer case study.

Step 5: Configuring Incident Response

Investigations are the primary operational mode. When an alert fires or an anomaly is detected, the agent launches an autonomous investigation that follows a five-phase cycle: detection, hypothesis generation, telemetry analysis, root cause analysis, and mitigation planning. The agent runs 24/7, which means incidents at 3 AM on a Saturday receive the same investigation quality as those during business hours.

  1. In the Agent Space settings, navigate to Investigation Settings.
  2. Configure alert sources. Connect PagerDuty, CloudWatch Alarms, or any alerting tool that can send webhooks. When an alert triggers, the agent automatically opens an investigation.
  3. Set the concurrency limit. The default is 3 concurrent investigations per Agent Space (adjustable). For high-incident environments, consider creating multiple Agent Spaces or increasing this quota.
  4. Configure the mitigation approval workflow. The agent produces mitigation plans with rollback procedures, but human approval is always required before execution. Set up notification channels (Slack, email, or the Web App) where operators receive approval requests.
  5. Enable the immutable audit journal. This is on by default and should not be disabled. Every investigation phase is recorded in a tamper-resistant log that integrates with AWS CloudTrail for compliance purposes. For broader AI safety controls on the Bedrock platform, see our Bedrock Guardrails breakdown. The agent cannot modify journal entries after they are written.

Verification: Trigger a test investigation. The simplest way is to create a CloudWatch Alarm with a deliberately low threshold that your application will breach. When the alarm fires, the agent should begin an investigation within 60 seconds. Check the Investigations panel in the Web App for a new entry with status "In Progress." Once it completes, review the root cause analysis and mitigation plan to confirm the agent correctly identified the source of the alarm.

Step 6: Connecting Collaboration Tools

The agent can push investigation results, mitigation approval requests, and evaluation findings to your team's existing communication channels. This keeps the feedback loop tight and prevents operators from needing to constantly check the Web App.

Slack Integration

  1. In the Agent Space settings, go to Integrations and click Add Integration.
  2. Select Slack and authorize the DevOps Agent Slack app in your workspace.
  3. Choose the channels where investigation summaries and approval requests should be posted. Use a dedicated channel like #devops-agent-alerts rather than a general operations channel.
  4. Configure notification preferences: all investigations, only critical severity, or only when human approval is needed.

ServiceNow Integration

  1. Select ServiceNow from the integrations list.
  2. Enter your ServiceNow instance URL and provide API credentials with incident creation permissions.
  3. Map investigation findings to ServiceNow incident fields. The agent can automatically create incidents and populate root cause analysis fields.

PagerDuty Integration

  1. Select PagerDuty and provide your integration key.
  2. Configure bidirectional sync: PagerDuty alerts trigger investigations, and investigation results update the PagerDuty incident with root cause details.

Verification: After connecting Slack, run another test investigation (or wait for the next real alert). Within 2-3 minutes of the investigation completing, you should receive a summary message in your configured Slack channel. The message should include the investigation title, root cause summary, severity, and a link to the full report in the Web App.

24/7
Autonomous investigation coverage. Incidents at 3 AM on a Saturday receive the same 5-phase investigation quality as those during business hours.
Source: AWS DevOps Agent documentation

Step 7: Running Your First On-Demand SRE Task

On-Demand SRE tasks are the conversational mode of the agent. Operators can ask natural language questions, request custom reports, or task the agent with ad hoc analysis. Up to 10 concurrent on-demand tasks per Agent Space (adjustable). This is where teams interact with the agent outside of incident response.

  1. Open the DevOps Agent Web App (this is separate from the AWS Management Console).
  2. Navigate to your Agent Space and click New Task or open the chat interface.
  3. Start with a simple query to confirm connectivity: Show me the top 5 services by error rate over the past 24 hours.
  4. Try a comparison query: Compare latency for the checkout service between the last two deployments.
  5. Request a custom report: Generate a weekly reliability summary for all services in this Agent Space, including error rates, latency p99, and deployment frequency.

The agent draws on all connected observability tools, the topology graph, and historical investigation data to answer these queries. It produces charts, tables, and narrative summaries depending on the question format.

Verification: The agent should respond within 30-60 seconds for simple queries and 2-5 minutes for complex analysis. Responses should reference specific metrics from your connected tools. If the agent returns "No data available," verify that the observability integrations from Step 3 are connected and that the time range you requested has data.

Step 8: Setting Up Evaluations

Evaluations are the proactive mode. Instead of waiting for alerts, the agent analyzes historical incident patterns, operational telemetry, and configuration drift to identify problems before they cause outages. Think of evaluations as a continuous reliability review that runs without human prompting. You can run 1 concurrent evaluation per Agent Space (this limit is not adjustable).

  1. In the Agent Space settings, navigate to Evaluation Settings.
  2. Enable evaluations for the Agent Space. This starts the background analysis process.
  3. Set evaluation scope. You can limit evaluations to specific services, environments, or risk categories.
  4. Configure notification preferences for evaluation findings. Each finding includes a severity rating and supporting evidence.
  5. Review the first batch of findings. The agent typically produces initial recommendations within 24-48 hours as it analyzes your historical patterns and current configuration.

Evaluations surface issues like: configuration drift between environments, services approaching resource limits, repeated incident patterns that suggest an underlying systemic issue, and deployments that correlated with reliability regressions. Every recommendation comes with evidence and suggested remediation steps.

Verification: After enabling evaluations, check the Evaluations panel daily for the first week. You should see findings appear within 24-48 hours. Each finding should include a severity level (Critical, High, Medium, Low), a description of the issue, supporting telemetry evidence, and a recommended action. If no findings appear after 48 hours and your environment is actively generating telemetry, check that the evaluation scope is not too narrow.


Pricing Reference

All three operational modes (Investigations, Evaluations, On-Demand SRE tasks) are billed at the same rate: $0.0083 per agent-second ($0.498 per agent-minute). There are no separate charges per mode.

Free trial: 2 months with 10 Agent Spaces, 20 hours of investigations, 15 hours of evaluations, and 20 hours of on-demand SRE tasks per month. No credit card required to start. AWS Support credits reduce costs further: 100% for Unified Operations, 75% for Enterprise, 30% for Business+.

Pricing verified from AWS documentation as of May 2026. All figures are estimates. Additional AWS service charges (data transfer, CloudWatch, etc.) apply separately.


Top 3 Failure Patterns
Insufficient IAM Permissions

The most common deployment blocker. If the agent's IAM role is missing the AWSDevOpsAgentServiceRolePolicy or lacks cross-account sts:AssumeRole permissions, investigations silently fail or return incomplete root cause analysis.

Topology Discovery Needs 24-48 Hours

Teams often expect the topology graph to be complete immediately. The agent needs 24-48 hours of background learning to observe traffic patterns, map dependencies, and build an accurate service graph. A sparse topology at hour 2 is normal.

Third-Party Integration Auth Failures

Expired or scoped-too-narrow API keys for Datadog, Dynatrace, or Splunk cause silent data gaps. Always test the connection after setup and re-verify if the agent's investigation results suddenly exclude a data source.

Troubleshooting and FAQ

Common Questions
The topology engine needs time to learn your environment. Give it 24-48 hours of background observation. If services are still missing after that period, check three things: (1) the AWS accounts containing those services are associated with the Agent Space, (2) resource tags are consistent (the agent uses tags like service, environment, and team), and (3) the IAM role has permissions to read CloudFormation stacks and resource metadata in those accounts.
No. They are separate products with different consoles and different purposes. Amazon Q Developer is an AI coding assistant (code generation, debugging, security scanning). AWS DevOps Agent is an autonomous operations agent (incident investigation, root cause analysis, proactive evaluations). They can work together: DevOps Agent generates remediation specs that Kiro (AWS's autonomous coding agent) can implement, with human approval required.
No. AWS DevOps Agent operates with a human-in-the-loop safety model. It investigates incidents, identifies root causes, and produces mitigation plans with rollback procedures, but a human operator must approve before any mitigation executes. For code-level fixes, the agent generates specifications that Kiro (AWS's coding agent) can implement, again with human approval required. This is a deliberate safety boundary, not a missing feature.
Yes. Multicloud support for Azure workloads is a GA feature. On-premises infrastructure connects via MCP servers routed through VPC private networking. Third-party observability tools (Datadog, Dynatrace, Splunk, New Relic, Grafana) are natively supported. The agent was designed for hybrid and multicloud environments from the start.
$0.0083 per agent-second ($0.498 per agent-minute), uniform across all three operational modes. A 2-month free trial includes 10 Agent Spaces with 20 hours of investigations, 15 hours of evaluations, and 20 hours of on-demand tasks per month. AWS Support credits reduce costs further: 100% for Unified Operations, 75% for Enterprise, 30% for Business+.
No. AWS states that customer content processed by DevOps Agent is not used for training. The agent's "learned skills" are specific to your environment and stay within your account boundary. Data is encrypted with AES-256, customer-managed keys (CMK) are supported, and all connections can be routed through private VPC endpoints.

Next Step

Start with a single Agent Space scoped to your most incident-prone application. Run it for two weeks during the free trial, collecting data on investigation accuracy and MTTR improvement. Compare the agent's root cause findings against your team's manual investigations during the same period. That comparison gives you concrete data for a business case. When you are ready to scale, create additional Agent Spaces for other services and explore the skills system to extend the agent's capabilities with custom operational procedures. For a deeper understanding of the architecture and limitations, read our What Is AWS DevOps Agent breakdown.