AgentWall Preprint Proposes OS-Level Runtime Safety Layer to Intercept AI Agent Actions Before Execution

May 19, 2026 2 min read arXiv preprint 2605.16265 Qualified Moderate S

Tech Jacks Solutions AI News Coverage

A research preprint submitted to arXiv as paper 2605.16265 proposes AgentWall, a runtime safety architecture that intercepts AI agent actions, shell commands, API calls, file modifications - at the operating system level before they execute. The paper is unreviewed and the architecture is proposed, not deployed.

ai-safety agentic-ai runtime-security context-poisoning open-source-ai agent-security

Key Takeaways

Research preprint arXiv:2605.16265 proposes AgentWall, OS-level runtime interception of agent actions before execution (unreviewed; architecture proposed, not deployed)
Target threat: context poisoning attacks, where adversarially crafted external data causes an agent to execute malicious actions
Addresses a documented gap in model-level alignment: OS-layer interception operates where the agent acts, not where it decides
No GitHub repository confirmed at publication time; independent evaluation pending

Preprint. Not peer-reviewed. Keep that framing visible for everything that follows.

With that said: the architectural question AgentWall addresses is real and increasingly urgent. Most agentic AI safety work focuses on model alignment, shaping what the model decides to do. AgentWall’s proposed approach moves the safety boundary to a different layer entirely: where the agent actually executes actions in the operating system, not where it decides them.

According to the preprint (arXiv:2605.16265), AgentWall intercepts agent actions at the OS level, filtering shell commands, API calls, and file modifications before they execute. The system is specifically designed to address context poisoning, the threat vector in which adversarially crafted external data in the agent’s context window causes it to take malicious actions. That’s not a theoretical edge case. It’s a documented attack surface for local agents with tool access.

Why it matters

Model-level alignment has a gap that this paper puts a name to. If an adversary can inject malicious instructions into the data an agent retrieves, a poisoned document, a compromised API response, a manipulated tool output, the model’s own safety training may not catch it. The model sees legitimate-looking instructions and acts on them. OS-level interception doesn’t try to out-reason the attacker. It puts a filter between the agent’s decision and the system it’s operating on.

Teams building local agents with file system access, shell execution, or external API calls should treat the OS/model safety boundary question as a live architectural decision right now. The specific AgentWall design is unproven, but the problem it addresses is well-documented. CISA’s agentic AI guidance, covered in depth at – points to exactly this attack surface as a priority concern.

Context

This connects to a pattern across recent cycles. DeepMind’s hidden instruction vulnerability research established that context poisoning is reproducible, not speculative. NIST, CISA, and the EU AI Act now collectively address agentic security requirements, per . AgentWall is a proposed technical response to regulatory and security requirements that are already in force.

What to watch

The paper is fresh. No GitHub repository was publicly listed at the time of this brief, if one appears, the design’s tractability becomes much easier to evaluate. Watch for follow-on responses from CISA or NIST referencing OS-level interception approaches, and for independent replication attempts in the security research community.

TJS synthesis

Don’t deploy AgentWall, it’s an unreviewed preprint. Do use it as a forcing function for a question your team probably hasn’t answered yet: at what layer does your local agent deployment enforce action constraints? If the honest answer is “the model handles it,” your threat model has a gap. Map that gap against your actual tool permissions before the next production deployment.