How to Secure AWS DevOps Agent: IAM, MCP, and Incident Permissions
AWS DevOps Agent operates as an autonomous SRE (Site Reliability Engineering) teammate inside your infrastructure — an automated on-call engineer that monitors, investigates, and triages incidents for you. It reads your logs, correlates your telemetry, and files support cases on your behalf. That level of access demands an equally deliberate approach to permissions.
Who this is for: AWS administrators, DevOps engineers, and security teams setting up or evaluating DevOps Agent. You should be comfortable creating IAM roles and navigating the AWS console. No prior experience with AI agents is required.
This guide walks through every IAM role, trust policy condition, and MCP (Model Context Protocol) integration control required to run DevOps Agent at the tightest permissions your operations allow.
What Is AWS DevOps Agent
AWS DevOps Agent is a fully autonomous AI agent that monitors your infrastructure continuously, investigates incidents when they happen, and suggests mitigation plans when something breaks. It is not a chatbot wrapped around CloudWatch. It builds an interactive dependency graph of your resources — a map of how your EC2 instances, RDS databases, Lambda functions, and other services connect to each other — so it can trace a failure in one resource to its impact on others. It correlates telemetry across logs, metrics, traces, deployment history, and source code diffs, and reasons about root cause at a level that a single on-call engineer typically cannot match in the first 20 minutes of an incident.
Dual-console architecture: The AWS Management Console handles IAM roles, Agent Space creation, and account association (admin tasks). The Operator Web App is where day-to-day operations happen: monitoring, investigations, on-demand tasks, and AWS Support escalation.
Each Agent Space is a logical security boundary — like a separate VPC for the agent's world. It defines which AWS accounts, external tools, and users the agent can access. Investigation data, incident history, recommendations, and chat conversations never bleed between spaces. One Agent Space per workload environment (production, staging) or per team is the recommended pattern. Scope too narrowly and you miss cross-service root causes; scope too broadly and investigation performance degrades.
The agent operates using a three-tier skill system: AWS-provided skills (built-in capabilities), user-defined skills (custom Markdown-based playbooks you write), and learned skills (a background learning sub-agent that analyzes your incident patterns over time). This skill hierarchy matters for security because all skills inherit the permissions of the Agent Space they run in. For example, if you write a user-defined skill that triages database connection failures, that skill can read any resource the Agent Space role has access to — so if the role has overly broad read permissions, the skill can access resources unrelated to databases.
Why Agent Security Matters
AWS DevOps Agent is recently generally available (GA in May 2026). The feature set is still evolving — expect new regions, higher concurrency limits, and additional integrations in the first 90 days. Verify configuration guidance against current AWS documentation.
Traditional monitoring tools observe but do not act. DevOps Agent reasons about your environment and takes limited actions (filing tickets, creating support cases). That distinction creates a different threat surface than a dashboard.
Three risk categories require specific attention:
- Prompt injection via operational data: The agent reads your logs, error messages, and resource tags. An attacker who can write to those data sources could embed instructions the agent interprets as operational context. The agent's real-time AI safety classifiers (rated ASL-3, a safety evaluation tier indicating the model has been tested against advanced adversarial scenarios) are designed to catch this, but authorized users with write access to logs or tags carry a higher probability of successful injection.
- MCP tool exposure: External integrations (GitHub, ServiceNow, Datadog, Splunk, PagerDuty, Slack) connect via MCP servers. Each connection expands the agent's reach. A misconfigured MCP server that exposes write tools, or a custom server with insufficient input validation, gives the agent capabilities you did not intend.
- Over-permissioned roles ("Agent God Mode"): This risk pattern, identified in third-party security research on the AWS AgentCore toolkit, applies to any agent with overly permissive IAM roles — including DevOps Agent. A compromised or misconfigured agent with wildcard permissions could access other agents' data, poison learned skills, or read investigation images from other Agent Spaces. The fix is strict least-privilege scoping on every trust policy, which is what this guide covers step by step.
Prerequisites
Before you begin, confirm you have the following in place. Every item is required; missing any one of them will block a step later in this guide.
- AWS account with administrative access (IAM role and policy creation permissions)
- AWS CLI v2 installed and configured with appropriate credentials
- Amazon CloudWatch and AWS CloudTrail active (included by default in all accounts)
- Optional for Steps 1-5: Business Support or higher (required only for AWS Support escalation in the Incident Response section — the core security configuration works on any support plan)
- Authentication method selected: IAM Identity Center is recommended — it gives operators 12-hour sessions and integrates with corporate SSO. Plain IAM works but sessions expire every 30 minutes, which is disruptive during incident investigations. External IdP (Okta, Entra ID) is for organizations already using those providers.
- Optional: Node.js 18+ and AWS CDK CLI, or Terraform 1.0+ (this guide covers the CLI path only — see AWS documentation for IaC deployment)
Step 1: Create the Agent Space IAM Role
Create the IAM role that the agent will assume when accessing your AWS resources:
- Open the IAM console, choose Roles, then Create Role
- Select Custom trust policy as the trusted entity type
- Paste the trust policy below
- On the permissions page, search for and attach the AWS-managed policy AIDevOpsAgentAccessPolicy
- Add the inline policy shown below (for Resource Explorer topology discovery)
- Name the role (e.g., DevOpsAgentSpaceRole) and create it
The trust policy is the critical security control. It must include confused deputy prevention conditions — these stop a scenario where one AWS service tricks another into performing actions on its behalf. Without these conditions, a different service running as aidevops.amazonaws.com could impersonate DevOps Agent and assume your role. The condition keys tie the trust to your specific account and Agent Space, so only the legitimate DevOps Agent service can use this role.
Trust Policy (Confused Deputy Prevention)
Without these condition keys, any AWS service running as aidevops.amazonaws.com could assume your role. The aws:SourceAccount condition restricts to your account. The aws:SourceArn condition restricts to your Agent Space. Both are mandatory.
Inline Policy: Service-Linked Role Creation
This grants the agent permission to create the AWSServiceRoleForResourceExplorer service-linked role (SLR) — a special IAM role that an AWS service creates and manages on your behalf. The SLR powers topology discovery and dependency graphing. The condition key prevents this permission from being used to create SLRs for any other AWS service.
About the ARN format: The double colon in arn:aws:iam::*:role/... is intentional, not a typo. In IAM service ARNs, the region field is empty because IAM is a global service, resulting in arn:aws:iam::
Verify
Run aws iam get-role --role-name DevOpsAgentSpaceRole and confirm the output includes the trust policy with both aws:SourceAccount and aws:SourceArn conditions. Verify that AIDevOpsAgentAccessPolicy appears in the attached policies list.
Step 2: Create the Operator App Role
Create a second IAM role for operator access — this controls how human users authenticate to the DevOps Agent web application. Attach the AIDevOpsOperatorAppAccessPolicy managed policy.
If you are using IAM Identity Center (recommended), add an inline policy granting:
- sso:ListInstances
- sso:DescribeInstance
- identitystore:DescribeUser
These permissions allow the Operator App to resolve user identities through IAM Identity Center. Without them, the web app cannot map SSO tokens to DevOps Agent user sessions.
Trust Policy
The sts:TagSession action allows the service to tag assumed sessions, which the agent uses for audit attribution. The confused deputy condition prevents role assumption from unrelated principals.
Verify
Run aws iam get-role --role-name DevOpsAgentOperatorAppRole and confirm the trust policy includes the aws:SourceAccount condition. Verify that AIDevOpsOperatorAppAccessPolicy is attached. If using IAM Identity Center, confirm the inline SSO policy is also present.
Step 3: Create and Configure the Agent Space
You have three paths to create an Agent Space: the AWS Management Console wizard, the AWS CLI, or Infrastructure as Code (CDK or Terraform).
CLI Path
After creating the space, associate your primary AWS account for monitoring. This triggers topology discovery: the agent indexes your CloudFormation stacks and Resource Explorer data to build its dependency graph.
Scope deliberately: Each Agent Space has isolation trade-offs. Too narrow (one space per microservice) means the agent cannot correlate cross-service incidents. Too broad (one space for your entire organization) degrades performance and increases blast radius. Start with one Agent Space per production environment and split only when correlation needs diverge.
Enable the Operator App with your chosen authentication flow. IAM Identity Center with 12-hour sessions is recommended for daily operations. IAM with 30-minute sessions works for automated tooling. External IdPs (Okta, Entra ID) are supported for organizations with existing identity infrastructure.
Verify
Run aws devops-agent describe-agent-space --agent-space-name production-sre and confirm the status shows ACTIVE with your account listed under associated accounts. Topology discovery typically takes 5-10 minutes. If the status remains CREATING after 15 minutes, verify the Agent Space role's SLR inline policy is correct and that Resource Explorer is not blocked by an SCP.
Step 4: Secure Cross-Account Monitoring
When you need the agent to monitor workloads across multiple AWS accounts (common in Organizations-based setups), create a role in each secondary account with the same AIDevOpsAgentAccessPolicy plus the inline SLR policy from Step 1.
The critical difference for cross-account roles: the trust policy must use sts:ExternalId pointing to your primary Agent Space. This prevents confused deputy attacks where another account's agent could assume your secondary account role.
Cross-Account Trust Policy
Verify
Test the cross-account trust by running aws sts assume-role --role-arn arn:aws:iam::SECONDARY_ACCOUNT_ID:role/DevOpsAgentSpaceRole --role-session-name test --external-id arn:aws:aidevops:REGION:PRIMARY_ACCOUNT_ID:agent-space/SPACE_ID from the primary account. A successful response confirms the trust policy is correctly configured. An AccessDenied error means the external ID or source account condition does not match.
Step 5: Lock Down MCP Integrations
MCP servers connect the agent to external tools: GitHub for source code, ServiceNow for incident tickets, Datadog or Splunk for telemetry, PagerDuty and Slack for notifications. Think of MCP as a standardized plug-in system — instead of the agent needing custom code for every tool, each tool runs a small server that speaks the MCP protocol, and the agent connects through a common interface. Each connection introduces risk proportional to the permissions it grants.
The security model works in two layers: servers are registered at the AWS account level, but individual tools are allowlisted per Agent Space. This means registering a GitHub MCP server does not automatically expose all GitHub API operations to the agent. You explicitly choose which tools (read repo contents, list pull requests) each Agent Space can invoke.
Access Controls
- Tool allowlisting, not full server exposure: Register the MCP server once. Then allowlist only the specific tools each Agent Space needs. A monitoring Agent Space gets read-repo and list-PRs. It does not get merge-PR or delete-branch.
- Read-only mandate: Custom MCP servers and the credentials they use must only have read-only permissions. The agent is designed to be read-only; extending write capabilities through MCP breaks this design constraint.
- Authentication: OAuth 2.0 (Client Credentials or 3-legged), API key/token-based, or AWS SigV4 for API Gateway-hosted servers.
Network and Transport
For tools that must stay on private networks, use VPC Lattice (a managed networking service). It creates private connections by placing ENIs (Elastic Network Interfaces — virtual network cards) in your VPC subnets, so traffic between the agent and your tools stays on the AWS private network and never touches the public internet. Security groups on these ENIs are controlled by you.
Transport requirement: Streamable HTTP only — the agent does not support SSE (Server-Sent Events) or WebSocket transports. If you are building a custom MCP server, it must use standard HTTP request-response. Tool names are limited to 64 characters.
Custom MCP risk: AWS-provided MCP integrations (CloudWatch, GitHub, ServiceNow) include native security controls. Custom MCP servers you build or import do not. If a custom tool has a command injection vulnerability, the agent could be tricked into exploiting it. Tool-level security is your responsibility under the shared responsibility model.
Verify
In the Agent Space settings, review the MCP tool allowlist and confirm that only the tools you explicitly authorized are listed. For each allowed tool, verify the associated credentials are read-only. If using VPC Lattice, confirm the resource gateway's ENIs appear in your VPC console with the correct security group assignments.
Understanding Prompt Injection Defenses
Because DevOps Agent ingests operational data (logs, metrics, error messages, resource tags, ticket descriptions) as context for its reasoning, prompt injection is the primary threat vector. The agent's defense is a 4-layer stack:
- Limited write capabilities: Even if an injection succeeds at the reasoning level, the agent cannot mutate your resources. The only write actions available are opening tickets and creating AWS Support cases. This caps the blast radius of any successful injection.
- The agent operates strictly within the IAM roles you configure — it cannot access resources outside the accounts associated with its Agent Space, regardless of what instructions appear in log data. This is account boundary enforcement.
- Real-time AI safety classifiers analyze both the input context and the agent's proposed actions to detect and block injection attempts before the agent acts on them.
- Every reasoning step, API call, and tool invocation is recorded in a tamper-proof immutable audit trail. The agent itself cannot modify these records, so a successful injection is detectable after the fact even if the classifiers miss it.
Where the Defenses Are Weakest
Authorized users who can modify logs, resource tags, or ticket fields have a higher probability of successful prompt injection. This is not a bug in the classifiers; it is an inherent property of systems that trust their operational data. The mitigation is access control hygiene: restrict who can write to the data sources the agent reads.
Custom MCP tools may lack the native security controls that AWS-provided integrations include. A malicious or compromised custom tool could return data designed to manipulate the agent's reasoning. Treat custom MCP server security as part of your application security program, not as something the agent handles for you.
What You Can Do
The 4-layer defense stack is AWS-managed — the classifiers and write restrictions are on by default. Your role is to minimize the attack surface they have to protect:
- Restrict IAM write access to CloudWatch log groups, resource tags, and ticket fields that the agent reads. The fewer people who can write to these data sources, the smaller the injection surface.
- Enable vended logs and set up CloudWatch metric filters or alarms for unusual agent behavior (unexpected tool invocations, repeated investigation re-attempts on the same resource).
- For custom MCP servers: implement input validation, sanitize all return values, and run dependency scanning on the server code. The agent trusts what tools return.
Incident Response Permissions
Your runbook needs to know where the agent stops and the engineer starts.
What the Agent Can Do
- Detect incidents triggered by CloudWatch alarms, webhooks (PagerDuty, Grafana), ServiceNow tickets, or manual requests from the web app
- Correlate telemetry across logs, metrics, and traces with a 20-minute look-back window
- Cross-reference deployment history, CI/CD pipeline runs, and source code diffs to identify root cause
- Deduplicate and link related incidents during triage
- Generate detailed step-by-step mitigation plans with agent-ready code specs for coding agents
- Create AWS Support cases directly from the web app, auto-packaging the entire investigation timeline, logs, and context
- Chat with AWS Support inside the DevOps Agent web app (requires Business Support or higher)
What the Agent Cannot Do
- Execute remediation steps automatically. All mitigation plans require human review and approval.
- Modify, delete, or reconfigure any AWS resource.
- Access resources outside the accounts associated with its Agent Space.
- Override IAM permissions. If the Agent Space role does not have access to a resource, the agent cannot read it.
Support Escalation Requirements
To use the in-app Support escalation feature, the IAM role needs permissions to create and describe AWS Support cases (consult the current AWS DevOps Agent documentation for the exact actions required, as the managed policy may already include them). The account must have Business Support, Enterprise Support, or Unified Operations. Without an eligible support plan, the agent can still investigate and generate mitigation plans, but it cannot file cases or start in-app support chats.
Limitations and Gaps
Three of these are engineering constraints you can work around. The rest are gaps that require compensating controls or patience.
Additional gaps to watch: Amazon Managed Grafana webhook contact points are not supported (use self-hosted Grafana or alternative alert routing). If you use ClickHouse as a Grafana data source, that specific integration path is also unsupported. Compliance certifications (SOC 2, ISO 27001) are under third-party audit review but not yet finalized. The service is HIPAA eligible and in scope for PCI, C5, BIO, CISPE, CPSTIC, ENS High, FINMA, and additional frameworks. Optional customer-managed KMS key encryption is available for data at rest.
Pricing note: At the time of writing, AWS has not published a rate card or TCO calculator for DevOps Agent. What we know: billing is per-second with no upfront commitments or seat licenses. AWS Support customers receive monthly usage credits against their DevOps Agent consumption (a percentage of their Support charge, varying by plan). The audit trail (vended logs) has no separate enablement cost, but you pay standard CloudWatch, S3, or Data Firehose rates for log storage and delivery. Until a pricing calculator is available, use the AWS Cost Explorer after enabling the service to monitor actual charges.
Deployment Checklist
Use this tracker to confirm every security step is complete before your Agent Space goes live. Each item maps to a step in this guide.
- Agent Space IAM role created with confused deputy conditions (aws:SourceAccount + aws:SourceArn)
- Operator App IAM role created with appropriate auth flow (IDC recommended)
- Agent Space created and primary account associated for topology discovery
- Cross-account roles deployed with sts:ExternalId conditions
- MCP integrations registered with per-space tool allowlisting (read-only only)
- VPC Lattice configured for private MCP connections (if applicable)
- Vended logs enabled for immutable audit trail (CloudWatch, S3, or Data Firehose)
Troubleshooting
The most common cause is a mismatch between the aws:SourceArn condition in the trust policy and the actual Agent Space ARN. Verify the region and account ID in your trust policy match the region and account where you are creating the Agent Space. Also check that the principal is exactly aidevops.amazonaws.com (not a different service namespace). Run aws iam get-role --role-name DevOpsAgentSpaceRole and compare the trust policy JSON against the template in Step 1.
The agent needs the Resource Explorer service-linked role to index your resources. Verify the inline SLR policy from Step 1 is attached and that the iam:AWSServiceName condition key is set to resource-explorer-2.amazonaws.com. Also check whether an SCP (Service Control Policy) at the organization level is blocking iam:CreateServiceLinkedRole. If Resource Explorer was never enabled in the region, the agent will create the SLR automatically — but SCP blocks prevent this.
MCP servers are registered at the account level, but tools must be explicitly allowlisted per Agent Space — registration alone does not expose tools. Navigate to the Agent Space MCP configuration and verify the tool appears in the available tools list. If it does not, confirm the MCP server is using Streamable HTTP transport (SSE and WebSocket are not supported) and that the tool name is 64 characters or fewer. For private MCP servers behind VPC Lattice, verify the resource gateway's ENIs have security group rules allowing inbound traffic from the agent.
The sts:ExternalId in the secondary account's trust policy must exactly match the primary Agent Space ARN, including region and Agent Space ID. Copy the ARN from the Agent Space console page (not from memory). A common mistake is using the Agent Space name instead of the full ARN. The format is arn:aws:aidevops:REGION:PRIMARY_ACCOUNT_ID:agent-space/SPACE_ID.
If you consistently hit the investigation cap, split high-traffic workloads into separate Agent Spaces rather than requesting a quota increase alone. The default limits are 3 concurrent investigations and 10 on-demand tasks per Agent Space (both adjustable). Evaluations are capped at 1 per Agent Space and cannot be increased. You can run up to 100 Agent Spaces per account per region.