Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Amazon Web Services

What Is AWS DevOps Agent? Autonomous Incident Response Explained

Last verified: May 14, 2026  ·  Format: Breakdown

75%
MTTR reduction reported by early adopters including WGU (120 min → 28 min)
Source: AWS customer case study, Western Governors University
94%
Root cause accuracy reported in AWS internal testing across autonomous investigations
Source: AWS DevOps Agent launch blog, March 2026
$0.498
Per agent-minute ($0.0083/second), uniform across all operational modes
Source: AWS pricing page
6
AWS regions at general availability (US, EU, APAC)
Source: AWS What's New, March 2026
3
Operational modes: Investigations, Evaluations, and On-Demand SRE tasks
Source: AWS DevOps Agent documentation

Your on-call engineer's phone rings at 3 AM. By the time they open their laptop, log into the console, and start pulling CloudWatch metrics, the AWS DevOps Agent has already identified the root cause, correlated telemetry from Datadog and PagerDuty, built a topology graph of every affected service, and drafted a mitigation plan waiting for human approval. That investigation took it four minutes. Your engineer would have needed forty.

AWS DevOps Agent is Amazon's autonomous SRE agent, built on Bedrock AgentCore and generally available since March 31, 2026. It does not replace your operations team. It runs investigations, evaluations, and ad hoc queries in parallel with them, 24 hours a day, at $0.498 per agent-minute. Early adopters report MTTR improvements up to 75%, and AWS puts its root cause analysis accuracy at 94%. This is not another chatbot bolted onto your monitoring stack. It is an agent that builds its own understanding of your application topology, learns from your incident history, and produces immutable audit trails that it cannot modify after the fact.

What Is AWS DevOps Agent

AWS DevOps Agent is an autonomous AI agent purpose-built for operations and site reliability engineering. It falls under Amazon's "frontier agent" category alongside the AWS Security Agent and Kiro (the autonomous coding agent). It is a separate product from Amazon Q Developer and runs in its own dedicated console experience. Teams that also need custom model training and deployment should evaluate Amazon SageMaker alongside it.

The agent was previewed at re:Invent 2025 on December 2, 2025, and reached general availability on March 31, 2026 across six AWS regions: US East (N. Virginia), US West (Oregon), Europe (Frankfurt, Ireland), and Asia Pacific (Sydney, Tokyo). It is built on Amazon Bedrock AgentCore, which provides dedicated infrastructure for agent memory, policies, evaluations, and observability. For a deeper look at the foundation model layer underneath, see our Amazon Bedrock breakdown.

What DevOps Agent Is and What It Isn't
It Is Not

A chatbot bolted onto your monitoring stack. Not a replacement for your operations team. Not Amazon Q Developer (that is a separate product). Does not write or deploy code.

It Is

An autonomous SRE agent that runs investigations, evaluations, and ad hoc queries 24/7 in parallel with your team. Builds its own topology graph. Produces immutable audit trails it cannot edit after the fact.

The core premise is straightforward: instead of a human triaging alerts, pulling metrics, reading logs, and correlating across tools at 3 AM, the agent does that work autonomously. It generates a hypothesis, gathers evidence from your connected observability and ticketing systems, identifies the root cause, and presents a mitigation plan for human approval. It records every step in an immutable audit journal that the agent itself cannot edit after the fact.

For context on where AWS DevOps Agent fits in the broader AI tool landscape, visit the AI Tools Hub and the AWS sub-hub. Ready to set it up? Our step-by-step guide to using AWS DevOps Agent walks through Agent Space configuration, integration setup, and your first investigation.

How It Works

AWS DevOps Agent operates through a dual-console architecture. Administrators configure the agent in the AWS Management Console (IAM roles, integrations, Agent Spaces). Operators interact with it through a separate DevOps Agent Web App for day-to-day investigations, evaluations, and on-demand queries.

Foundation
Agent Spaces
  • Logical scope boundary per service or team
  • Up to 100 spaces per account per region
  • Isolated permissions and integrations
Intelligence
Topology Engine
  • Auto-discovers services and dependencies
  • Learns from CloudFormation, tags, and traffic
  • Continuously updated, not a static diagram
Execution
Investigation Flow
  • 5-phase cycle: detect, hypothesize, analyze, identify, mitigate
  • Immutable audit journal per investigation
  • Human approval required before execution

Agent Spaces

The foundational unit of organization is the Agent Space: a logical container that defines scope, associated AWS accounts, third-party integrations, operator permissions, and the agent's operating boundaries. You can run up to 100 Agent Spaces per account per region (adjustable via service quotas). Each space is an isolated scope boundary, so the agent working on your payment service cannot see or access the topology of your marketing platform unless you explicitly configure it.

Topology Engine

When you connect an Agent Space to your AWS accounts and observability tools, the agent begins building an application topology graph. It discovers services, dependencies, and communication patterns from CloudFormation stacks, resource tags, CI/CD pipelines, and background learning from your actual traffic patterns. This topology is not a static diagram. The agent updates it continuously, and it forms the foundation of every investigation: when something breaks, the agent already knows what depends on what.

Investigation Flow

Every autonomous investigation follows a five-phase cycle:

  • Detection: An alert or anomaly triggers an investigation (from CloudWatch, PagerDuty, Datadog, or any connected source)
  • Hypothesis: The agent generates one or more hypotheses about the root cause based on the topology and historical patterns
  • Telemetry analysis: It queries metrics, logs, and traces across all connected observability platforms to test each hypothesis
  • Root cause identification: The agent narrows to the most probable cause with supporting evidence
  • Mitigation: It produces a mitigation plan with rollback procedures, presented for human approval before execution

Every phase is recorded in an immutable audit journal. The agent cannot modify journal entries after they are written, which means you get a tamper-resistant record of exactly what the agent did, what evidence it found, and why it reached its conclusion. These journals integrate with AWS CloudTrail for compliance and audit purposes.

77%
MTTR improvement at Western Governors University (from 120 minutes down to 28 minutes per incident)
Source: AWS customer case study, WGU

Three Operational Modes

AWS DevOps Agent operates in three distinct modes, each addressing a different phase of the operational lifecycle. All three modes are billed at the same rate: $0.0083 per agent-second.

1. Investigations (Incident Response)

The primary mode. When an alert fires or an anomaly is detected, the agent launches an autonomous investigation. It runs 24/7, which means incidents that occur at 3 AM on a Saturday receive the same quality of investigation as those that happen during business hours. The agent can run up to 3 concurrent investigations per Agent Space (adjustable). Investigations follow the five-phase cycle described above and produce structured findings with root cause analysis, timeline reconstruction, and a human-reviewable mitigation plan.

2. Evaluations (Incident Prevention)

Evaluations are the proactive mode. The agent analyzes historical incident patterns, operational telemetry, and configuration drift to identify problems before they cause outages. It produces recommendations with severity ratings and supporting evidence. You can run 1 concurrent evaluation per Agent Space (not adjustable). Think of evaluations as a continuous reliability review that runs without human prompting.

3. On-Demand SRE Tasks (Chat)

The conversational mode. Operators can ask natural language questions, request custom charts and reports, or task the agent with ad hoc analysis. Up to 10 concurrent on-demand tasks per Agent Space (adjustable). This is where teams interact with the agent outside of incident response: "Show me latency trends for the checkout service over the past 7 days" or "Compare error rates between the last two deployments."

AWS DevOps Agent Timeline
1
Oct 2025
Bedrock AgentCore GA
The underlying infrastructure layer for frontier agents reaches general availability, providing memory, policies, evaluations, and observability.
2
Dec 2, 2025
Preview at re:Invent 2025
AWS DevOps Agent announced in preview. Design partners include T-Mobile and United Airlines.
3
Mar 31, 2026
General Availability
GA launch across 6 regions with 2-month free trial, multicloud Azure support, and MCP server integration.
4
2026 & Beyond
Expanding Integrations
AWS Labs maintains 56+ open-source MCP servers. Skills system replaces legacy Runbooks. On-premises support via VPC-connected MCP.

Integrations

AWS DevOps Agent connects to a broad ecosystem of observability, CI/CD, ticketing, and identity providers. This is not limited to AWS-native tools. The agent was designed from the start to work in multicloud and hybrid environments.

Observability
Monitoring & Telemetry
  • Amazon CloudWatch (native)
  • Datadog
  • Dynatrace
  • New Relic
  • Splunk
  • Grafana & Prometheus
Code & CI/CD
Source & Deployment
  • GitHub
  • GitLab
  • Azure DevOps
Ticketing & Comms
Incident Management
  • ServiceNow
  • PagerDuty
  • Slack
Identity & Multicloud
Access & Cross-Platform
  • Okta
  • Microsoft Entra ID
  • IAM Identity Center
  • Azure workloads (GA)
  • On-premises via MCP

Any tool that does not have a native integration can be connected via MCP servers (Model Context Protocol). The agent supports Streamable HTTP transport with OAuth 2.0, API Key, and SigV4 authentication. MCP servers are registered at the AWS account level and shared across Agent Spaces. Private and on-premises MCP servers connect through VPC. AWS Labs maintains 56+ open-source MCP servers on GitHub for common operational tools.

Pricing

AWS DevOps Agent uses a single, uniform pricing model: $0.0083 per agent-second ($0.498 per agent-minute), regardless of which operational mode is running. There are no separate charges for investigations versus evaluations versus on-demand tasks.

Free trial: 2 months with 10 Agent Spaces, 20 hours of investigations, 15 hours of evaluations, and 20 hours of on-demand SRE tasks per month. No credit card required to start.

Monthly Cost Examples

Team Size Usage Profile Estimated Monthly
Small Team Light investigation usage, occasional on-demand queries ~$39.84
Active Team Regular investigations + evaluations + moderate SRE tasks ~$343.62
Enterprise High-frequency investigations, multiple spaces, continuous evaluations ~$2,290.80

AWS Support Credits

Existing AWS Support plans include DevOps Agent credits that reduce your effective cost:

  • Unified Operations: 100% credit (effectively free with this plan)
  • Enterprise Support: 75% credit
  • Business+ Support: 30% credit

Pricing verified from AWS documentation as of May 2026. All figures are estimates based on typical usage patterns and do not include data transfer or other AWS service charges.

Skills and MCP Servers

AWS DevOps Agent uses a modular skills system to extend its capabilities beyond the built-in investigation and evaluation logic. Skills replace the previous "Runbooks" system and are organized into three categories.

Skill Categories

  • AWS-provided skills: Pre-built instruction sets maintained by AWS for common operational tasks (CloudFormation drift detection, ECS container health checks, Lambda cold start analysis)
  • Custom skills: User-authored instruction sets packaged as a SKILL.md file plus references and assets (max 6MB zip). Scripts are not supported and will be rejected during upload
  • Learned skills: Skills the agent develops through background learning from your operational environment. The agent observes patterns across your investigations and creates reusable procedures without manual authoring

The skills system follows the Agent Skills specification (agentskills.io), an emerging standard for portable agent instruction sets.

MCP Server Architecture

MCP (Model Context Protocol) servers give the agent access to tools, APIs, and data sources that do not have a native integration. Key details:

  • Transport: Streamable HTTP only (no WebSocket or stdio)
  • Authentication: OAuth 2.0, API Key, or SigV4
  • Scope: Registered at the AWS account level, shared among all Agent Spaces in that account
  • Private access: On-premises and VPC-hosted MCP servers connect through private networking
  • Open source: AWS Labs maintains 56+ MCP servers covering common operational tools
56+
Open-source MCP servers maintained by AWS Labs on GitHub, covering observability, ticketing, CI/CD, and identity tools
Source: AWS Labs GitHub, 2026

Who Should Use AWS DevOps Agent

Who Gets the Most Value
🚨
SRE & On-Call Teams

The primary audience. Teams drowning in alert fatigue get autonomous triage and investigation running 24/7. The agent does the 3 AM correlation work so your engineers sleep through incidents they do not need to be on.

Best fit: Investigations mode
⚙️
DevOps Engineers

Engineers managing CI/CD pipelines and infrastructure benefit from the topology engine and proactive evaluations. The agent surfaces deployment-correlated issues and configuration drift before they become incidents.

Best fit: Evaluations + On-Demand
🏢
Enterprise Operations Centers

Large organizations running hundreds of AWS accounts need consistent investigation quality across teams and time zones. United Airlines runs a single pane across 500 AWS accounts. Agent Spaces provide scoped isolation with centralized management.

Best fit: Multi-space enterprise deployment
☁️
Multicloud Teams

Teams operating across AWS and Azure get native multicloud support at GA. On-premises workloads connect via MCP servers through VPC. T-Mobile uses the agent with Splunk integration across their multicloud footprint.

Best fit: Azure + MCP integrations
3-5x
Faster resolution speed reported by early adopters compared to manual investigation workflows
Source: AWS DevOps Agent documentation

Limitations

AWS DevOps Agent is a capable tool with clear boundaries. Understanding what it cannot do is as important as knowing what it can.

Key Limitations
No Autonomous Code Changes

The agent investigates and recommends but does not execute code changes. For remediation that requires code modifications, it generates specifications for Kiro (AWS's autonomous coding agent). Human approval is required before any mitigation plan executes. This is a deliberate safety boundary, not a missing feature.

Limited Regional Availability

GA in only 6 regions as of March 2026: US East (N. Virginia), US West (Oregon), Europe (Frankfurt, Ireland), and Asia Pacific (Sydney, Tokyo). Teams operating in South America, Middle East, Africa, or other APAC regions must route agent traffic to a supported region or wait for expansion.

Concurrency Limits

3 concurrent investigations per Agent Space (adjustable), but only 1 concurrent evaluation per space (not adjustable). High-incident environments may need multiple Agent Spaces to avoid queuing, adding organizational complexity and cost.

No Script Execution in Skills

Custom skills cannot include executable scripts. They are limited to SKILL.md instruction files plus reference documents and assets (6MB max). If your remediation runbooks depend on script execution, you need to refactor them into declarative instructions or connect via MCP servers.

If you need to understand the safety controls that govern AI agent outputs on AWS, including content filtering and PII redaction, read our Amazon Bedrock Guardrails breakdown. Additional considerations: the agent is billed per second of active computation, which means cost scales directly with investigation complexity. The topology engine requires time to learn your environment; expect reduced accuracy during the first weeks of deployment. The agent generates specifications for Kiro but does not have a built-in feedback loop to verify that Kiro's code changes resolved the issue.

Frequently Asked Questions
No. They are separate products with different consoles and different purposes. Amazon Q Developer is an AI coding assistant (code generation, debugging, security scanning). AWS DevOps Agent is an autonomous operations agent (incident investigation, root cause analysis, proactive evaluations). They can work together: DevOps Agent generates remediation specs that Kiro (not Q Developer) can implement.
Yes. Multicloud support for Azure workloads is a GA feature. On-premises infrastructure connects via MCP servers routed through VPC private networking. Third-party observability tools (Datadog, Dynatrace, Splunk, New Relic, Grafana) are natively supported. The agent was designed for hybrid and multicloud environments from the start.
$0.0083 per agent-second ($0.498 per agent-minute), uniform across all three operational modes. A 2-month free trial includes 10 Agent Spaces with 20 hours of investigations, 15 hours of evaluations, and 20 hours of on-demand tasks per month. AWS Support credits reduce costs further: 100% for Unified Operations, 75% for Enterprise, 30% for Business+.
No. AWS DevOps Agent operates with a human-in-the-loop safety model. It investigates incidents, identifies root causes, and produces mitigation plans with rollback procedures, but a human operator must approve before any mitigation executes. For code-level fixes, the agent generates specifications that Kiro (AWS's coding agent) can implement, again with human approval required.
No. AWS states that customer content processed by DevOps Agent is not used for training. The agent's "learned skills" are specific to your environment and stay within your account boundary. Data is encrypted with AES-256, customer-managed keys (CMK) are supported, and all connections can be routed through private VPC endpoints.