What Is AWS DevOps Agent? Autonomous Incident Response Explained
Last verified: May 14, 2026 · Format: Breakdown
Your on-call engineer's phone rings at 3 AM. By the time they open their laptop, log into the console, and start pulling CloudWatch metrics, the AWS DevOps Agent has already identified the root cause, correlated telemetry from Datadog and PagerDuty, built a topology graph of every affected service, and drafted a mitigation plan waiting for human approval. That investigation took it four minutes. Your engineer would have needed forty.
AWS DevOps Agent is Amazon's autonomous SRE agent, built on Bedrock AgentCore and generally available since March 31, 2026. It does not replace your operations team. It runs investigations, evaluations, and ad hoc queries in parallel with them, 24 hours a day, at $0.498 per agent-minute. Early adopters report MTTR improvements up to 75%, and AWS puts its root cause analysis accuracy at 94%. This is not another chatbot bolted onto your monitoring stack. It is an agent that builds its own understanding of your application topology, learns from your incident history, and produces immutable audit trails that it cannot modify after the fact.
What Is AWS DevOps Agent
AWS DevOps Agent is an autonomous AI agent purpose-built for operations and site reliability engineering. It falls under Amazon's "frontier agent" category alongside the AWS Security Agent and Kiro (the autonomous coding agent). It is a separate product from Amazon Q Developer and runs in its own dedicated console experience. Teams that also need custom model training and deployment should evaluate Amazon SageMaker alongside it.
The agent was previewed at re:Invent 2025 on December 2, 2025, and reached general availability on March 31, 2026 across six AWS regions: US East (N. Virginia), US West (Oregon), Europe (Frankfurt, Ireland), and Asia Pacific (Sydney, Tokyo). It is built on Amazon Bedrock AgentCore, which provides dedicated infrastructure for agent memory, policies, evaluations, and observability. For a deeper look at the foundation model layer underneath, see our Amazon Bedrock breakdown.
A chatbot bolted onto your monitoring stack. Not a replacement for your operations team. Not Amazon Q Developer (that is a separate product). Does not write or deploy code.
An autonomous SRE agent that runs investigations, evaluations, and ad hoc queries 24/7 in parallel with your team. Builds its own topology graph. Produces immutable audit trails it cannot edit after the fact.
The core premise is straightforward: instead of a human triaging alerts, pulling metrics, reading logs, and correlating across tools at 3 AM, the agent does that work autonomously. It generates a hypothesis, gathers evidence from your connected observability and ticketing systems, identifies the root cause, and presents a mitigation plan for human approval. It records every step in an immutable audit journal that the agent itself cannot edit after the fact.
For context on where AWS DevOps Agent fits in the broader AI tool landscape, visit the AI Tools Hub and the AWS sub-hub. Ready to set it up? Our step-by-step guide to using AWS DevOps Agent walks through Agent Space configuration, integration setup, and your first investigation.
How It Works
AWS DevOps Agent operates through a dual-console architecture. Administrators configure the agent in the AWS Management Console (IAM roles, integrations, Agent Spaces). Operators interact with it through a separate DevOps Agent Web App for day-to-day investigations, evaluations, and on-demand queries.
- Logical scope boundary per service or team
- Up to 100 spaces per account per region
- Isolated permissions and integrations
- Auto-discovers services and dependencies
- Learns from CloudFormation, tags, and traffic
- Continuously updated, not a static diagram
- 5-phase cycle: detect, hypothesize, analyze, identify, mitigate
- Immutable audit journal per investigation
- Human approval required before execution
Agent Spaces
The foundational unit of organization is the Agent Space: a logical container that defines scope, associated AWS accounts, third-party integrations, operator permissions, and the agent's operating boundaries. You can run up to 100 Agent Spaces per account per region (adjustable via service quotas). Each space is an isolated scope boundary, so the agent working on your payment service cannot see or access the topology of your marketing platform unless you explicitly configure it.
Topology Engine
When you connect an Agent Space to your AWS accounts and observability tools, the agent begins building an application topology graph. It discovers services, dependencies, and communication patterns from CloudFormation stacks, resource tags, CI/CD pipelines, and background learning from your actual traffic patterns. This topology is not a static diagram. The agent updates it continuously, and it forms the foundation of every investigation: when something breaks, the agent already knows what depends on what.
Investigation Flow
Every autonomous investigation follows a five-phase cycle:
- Detection: An alert or anomaly triggers an investigation (from CloudWatch, PagerDuty, Datadog, or any connected source)
- Hypothesis: The agent generates one or more hypotheses about the root cause based on the topology and historical patterns
- Telemetry analysis: It queries metrics, logs, and traces across all connected observability platforms to test each hypothesis
- Root cause identification: The agent narrows to the most probable cause with supporting evidence
- Mitigation: It produces a mitigation plan with rollback procedures, presented for human approval before execution
Every phase is recorded in an immutable audit journal. The agent cannot modify journal entries after they are written, which means you get a tamper-resistant record of exactly what the agent did, what evidence it found, and why it reached its conclusion. These journals integrate with AWS CloudTrail for compliance and audit purposes.
Three Operational Modes
AWS DevOps Agent operates in three distinct modes, each addressing a different phase of the operational lifecycle. All three modes are billed at the same rate: $0.0083 per agent-second.
1. Investigations (Incident Response)
The primary mode. When an alert fires or an anomaly is detected, the agent launches an autonomous investigation. It runs 24/7, which means incidents that occur at 3 AM on a Saturday receive the same quality of investigation as those that happen during business hours. The agent can run up to 3 concurrent investigations per Agent Space (adjustable). Investigations follow the five-phase cycle described above and produce structured findings with root cause analysis, timeline reconstruction, and a human-reviewable mitigation plan.
2. Evaluations (Incident Prevention)
Evaluations are the proactive mode. The agent analyzes historical incident patterns, operational telemetry, and configuration drift to identify problems before they cause outages. It produces recommendations with severity ratings and supporting evidence. You can run 1 concurrent evaluation per Agent Space (not adjustable). Think of evaluations as a continuous reliability review that runs without human prompting.
3. On-Demand SRE Tasks (Chat)
The conversational mode. Operators can ask natural language questions, request custom charts and reports, or task the agent with ad hoc analysis. Up to 10 concurrent on-demand tasks per Agent Space (adjustable). This is where teams interact with the agent outside of incident response: "Show me latency trends for the checkout service over the past 7 days" or "Compare error rates between the last two deployments."
Integrations
AWS DevOps Agent connects to a broad ecosystem of observability, CI/CD, ticketing, and identity providers. This is not limited to AWS-native tools. The agent was designed from the start to work in multicloud and hybrid environments.
- Amazon CloudWatch (native)
- Datadog
- Dynatrace
- New Relic
- Splunk
- Grafana & Prometheus
- GitHub
- GitLab
- Azure DevOps
- ServiceNow
- PagerDuty
- Slack
- Okta
- Microsoft Entra ID
- IAM Identity Center
- Azure workloads (GA)
- On-premises via MCP
Any tool that does not have a native integration can be connected via MCP servers (Model Context Protocol). The agent supports Streamable HTTP transport with OAuth 2.0, API Key, and SigV4 authentication. MCP servers are registered at the AWS account level and shared across Agent Spaces. Private and on-premises MCP servers connect through VPC. AWS Labs maintains 56+ open-source MCP servers on GitHub for common operational tools.
Pricing
AWS DevOps Agent uses a single, uniform pricing model: $0.0083 per agent-second ($0.498 per agent-minute), regardless of which operational mode is running. There are no separate charges for investigations versus evaluations versus on-demand tasks.
Free trial: 2 months with 10 Agent Spaces, 20 hours of investigations, 15 hours of evaluations, and 20 hours of on-demand SRE tasks per month. No credit card required to start.
Monthly Cost Examples
| Team Size | Usage Profile | Estimated Monthly |
|---|---|---|
| Small Team | Light investigation usage, occasional on-demand queries | ~$39.84 |
| Active Team | Regular investigations + evaluations + moderate SRE tasks | ~$343.62 |
| Enterprise | High-frequency investigations, multiple spaces, continuous evaluations | ~$2,290.80 |
AWS Support Credits
Existing AWS Support plans include DevOps Agent credits that reduce your effective cost:
- Unified Operations: 100% credit (effectively free with this plan)
- Enterprise Support: 75% credit
- Business+ Support: 30% credit
Pricing verified from AWS documentation as of May 2026. All figures are estimates based on typical usage patterns and do not include data transfer or other AWS service charges.
Skills and MCP Servers
AWS DevOps Agent uses a modular skills system to extend its capabilities beyond the built-in investigation and evaluation logic. Skills replace the previous "Runbooks" system and are organized into three categories.
Skill Categories
- AWS-provided skills: Pre-built instruction sets maintained by AWS for common operational tasks (CloudFormation drift detection, ECS container health checks, Lambda cold start analysis)
- Custom skills: User-authored instruction sets packaged as a SKILL.md file plus references and assets (max 6MB zip). Scripts are not supported and will be rejected during upload
- Learned skills: Skills the agent develops through background learning from your operational environment. The agent observes patterns across your investigations and creates reusable procedures without manual authoring
The skills system follows the Agent Skills specification (agentskills.io), an emerging standard for portable agent instruction sets.
MCP Server Architecture
MCP (Model Context Protocol) servers give the agent access to tools, APIs, and data sources that do not have a native integration. Key details:
- Transport: Streamable HTTP only (no WebSocket or stdio)
- Authentication: OAuth 2.0, API Key, or SigV4
- Scope: Registered at the AWS account level, shared among all Agent Spaces in that account
- Private access: On-premises and VPC-hosted MCP servers connect through private networking
- Open source: AWS Labs maintains 56+ MCP servers covering common operational tools
Who Should Use AWS DevOps Agent
The primary audience. Teams drowning in alert fatigue get autonomous triage and investigation running 24/7. The agent does the 3 AM correlation work so your engineers sleep through incidents they do not need to be on.
Best fit: Investigations modeEngineers managing CI/CD pipelines and infrastructure benefit from the topology engine and proactive evaluations. The agent surfaces deployment-correlated issues and configuration drift before they become incidents.
Best fit: Evaluations + On-DemandLarge organizations running hundreds of AWS accounts need consistent investigation quality across teams and time zones. United Airlines runs a single pane across 500 AWS accounts. Agent Spaces provide scoped isolation with centralized management.
Best fit: Multi-space enterprise deploymentTeams operating across AWS and Azure get native multicloud support at GA. On-premises workloads connect via MCP servers through VPC. T-Mobile uses the agent with Splunk integration across their multicloud footprint.
Best fit: Azure + MCP integrationsLimitations
AWS DevOps Agent is a capable tool with clear boundaries. Understanding what it cannot do is as important as knowing what it can.
The agent investigates and recommends but does not execute code changes. For remediation that requires code modifications, it generates specifications for Kiro (AWS's autonomous coding agent). Human approval is required before any mitigation plan executes. This is a deliberate safety boundary, not a missing feature.
GA in only 6 regions as of March 2026: US East (N. Virginia), US West (Oregon), Europe (Frankfurt, Ireland), and Asia Pacific (Sydney, Tokyo). Teams operating in South America, Middle East, Africa, or other APAC regions must route agent traffic to a supported region or wait for expansion.
3 concurrent investigations per Agent Space (adjustable), but only 1 concurrent evaluation per space (not adjustable). High-incident environments may need multiple Agent Spaces to avoid queuing, adding organizational complexity and cost.
Custom skills cannot include executable scripts. They are limited to SKILL.md instruction files plus reference documents and assets (6MB max). If your remediation runbooks depend on script execution, you need to refactor them into declarative instructions or connect via MCP servers.
If you need to understand the safety controls that govern AI agent outputs on AWS, including content filtering and PII redaction, read our Amazon Bedrock Guardrails breakdown. Additional considerations: the agent is billed per second of active computation, which means cost scales directly with investigation complexity. The topology engine requires time to learn your environment; expect reduced accuracy during the first weeks of deployment. The agent generates specifications for Kiro but does not have a built-in feedback loop to verify that Kiro's code changes resolved the issue.