The Problem Isn’t Coming. It’s Here.
Stop treating agentic AI security as a future concern.
AI agents are executing real business operations today. They’re booking travel, executing API calls that move money, reading and writing files, managing customer service interactions, and in some deployments, spinning up other agents to handle subtasks. The systems making these decisions operate across surfaces that were never designed to be trusted by a machine acting autonomously: open web pages, third-party APIs, shared memory contexts, multi-agent orchestration layers.
Every one of those surfaces is an attack vector. Two major research and engineering efforts confirmed this week what security researchers have been documenting for over a year: the attack surface for AI agents is real, active, and not solved by the same approaches that secured earlier AI systems.
Google DeepMind’s Attack Taxonomy: Six Categories, Four Vectors
Google DeepMind researchers published a paper this week titled “AI Agent Traps,” providing the most systematic classification yet of adversarial techniques targeting autonomous agents. The organizing framework is functional: six attack categories, each exploiting one of four core agent capacities, perception, reasoning, memory, or action.
This is a more useful structure than prior taxonomies that grouped attacks by delivery mechanism. It tells you not just how an attack is delivered, but what it does to the agent’s cognition.
Two attack types are named explicitly in available reporting. Invisible text embedded in web pages: content structured so an AI agent reads and acts on it, while a human reviewer sees nothing. The agent loads a web page as part of a task, encounters instructions hidden in the page structure, and may execute those instructions as if they came from its legitimate operator. The second: viral memory poisoning, where malicious content injected into one agent’s memory context propagates across agent networks. In a multi-agent system, that’s not a single-agent compromise. It’s a vector for cascading failures across an entire orchestrated pipeline.
The paper also surfaces a legal gap with significant practical implications: no existing legal framework definitively assigns liability when a trapped AI agent commits a financial crime. This isn’t an academic observation. Agents with broad financial access are already deployed in production environments. If an agent executes a fraudulent transaction because an attacker poisoned its memory context, the question of accountability, operator, framework developer, platform, end user, has no current legal answer. That gap will eventually be forced into focus by a specific incident. The DeepMind taxonomy is the research record that will frame that conversation.
Microsoft’s Response: A Deployable Toolkit, Available Now
Microsoft’s Agent Governance Toolkit landed this week as an open-source security framework targeting 10 categories of adversarial attack against AI agents. The named threats include goal hijacking, where an attacker redirects an agent’s objective, memory poisoning, and rogue agent behavior. The toolkit integrates with existing agent frameworks without requiring replacement, and it operates at under 0.1 milliseconds latency.
That latency figure deserves a moment. Blocking a dangerous agent action requires intercepting it before execution. If the security layer adds meaningful processing time, it becomes impractical for production systems where agents are making sequential decisions at speed. Sub-millisecond operation means the toolkit can sit in the execution path without becoming the bottleneck.
The open-source release is a distinct strategic choice. A proprietary security product would create a vendor dependency in a space where organizations are already managing complex dependencies. Open-source means developers can audit the toolkit’s logic, adapt it to their specific agent architecture, and contribute improvements. For teams subject to security review cycles, an open-source tool is also substantially easier to clear than a black-box vendor product.
Survey data cited in recent reporting suggests a large majority of enterprises anticipate a major AI agent security incident in the coming year, though the methodology behind that specific figure couldn’t be independently confirmed at publication. What’s independently clear from the pattern of tooling and research released in recent months: the industry has moved past the stage of debating whether agentic systems represent a distinct threat surface. They do. The question is response.
What Remains Unsolved
The Governance Toolkit and the DeepMind taxonomy together don’t close the problem. They define it with enough precision to begin addressing it.
Prompt injection remains the foundational hard case. It’s the technique underlying many of the attack categories both organizations document, the ability to smuggle instructions into an agent’s context that override or subvert the legitimate operator’s intent. According to reporting citing an OpenAI statement from December 2025, this class of vulnerability may not be fully resolvable. That framing, if accurate, resets the security objective. The goal isn’t elimination. It’s continuous reduction of attack surface, detection of compromise when it occurs, and rapid containment.
The gap between the six DeepMind attack categories and the 10 Microsoft toolkit coverage types is worth examining carefully once both documents are fully accessible. The overlap and the gaps are where the next round of security tooling investment will go.
The multi-agent case is also largely unaddressed by available defensive tooling. Viral memory poisoning, where a compromise propagates between agents, requires not just per-agent defense but network-level detection of anomalous state propagation. That’s a substantially harder problem than blocking a single agent from executing a dangerous action.
What Practitioners Should Do Now
The practical answer depends on where you are in your agentic deployment lifecycle.
If you haven’t yet deployed agents in production: the DeepMind taxonomy and the Microsoft toolkit together give you the starting vocabulary for a security review before you do. Map your planned agent architecture against the six attack categories. Document which surfaces your agents will access, web pages, APIs, file systems, other agents, and treat each as a potential injection point.
If you have agents in production today: the Agent Governance Toolkit is available, open-source, and designed for integration with existing frameworks. The integration investment is justified by the documented attack surface. The alternative is to wait for an incident before implementing defenses, a position that gets harder to defend as the research record becomes more explicit.
For security and compliance teams: the liability gap documented in the DeepMind paper is a near-term regulatory exposure. Legal frameworks will eventually catch up to the deployment reality. Organizations that have documented their security posture against known attack categories, and can demonstrate active defensive tooling, will be better positioned in that conversation than those who treated agentic security as a future problem.
The Convergence Signal
Google DeepMind publishing a systematic attack taxonomy and Microsoft shipping a deployable defensive toolkit in the same week is not a coincidence of timing. It’s the agentic AI security field reaching a threshold. The attack surface is now well enough understood to classify. The defensive response is now mature enough to ship.
That threshold matters for practitioners. It means the information exists to act on. The taxonomy names what to defend against. The toolkit provides one way to defend against it. What remains, and this is the harder part, is the organizational will to treat agent security as infrastructure rather than future work.
The attacks aren’t waiting for that decision to be made.