The GPU era defined AI infrastructure for the past decade. Agentic AI may be starting to rewrite that equation.
AMD published a developer blog post on March 13, 2026, making the case that agentic AI workloads place a qualitatively different demand on data center hardware than traditional inference. The argument: where a single-shot LLM response requires a GPU to run flat out for a brief window, an agentic pipeline requires sustained orchestration, managing GPU task queues, branching on intermediate results, handling tool calls, and coordinating across multiple model invocations. That orchestration burden falls on the CPU.
AMD states its EPYC server processors are designed to support this model, working in conjunction with AMD Instinct GPUs to build what the company describes as “balanced, open AI infrastructure.” Those are vendor claims and should be read as such.
The architectural argument, however, isn’t only AMD’s to make. CNBC’s coverage of NVIDIA’s GTC conference noted that “agentic AI requires a lot of general compute,” with CPUs handling the sequential reasoning and logic that GPUs aren’t optimized for. The ARM Newsroom described the CPU’s emerging role as a “data orchestration engine” in rack-scale AI systems, language that echoes AMD’s framing but comes from a different vendor with different interests.
For practitioners building agentic systems today, the practical question isn’t which chip vendor wins the narrative. It’s whether the hardware stack chosen now accounts for orchestration costs that weren’t a factor in earlier inference-only architectures. AMD is raising that question for commercial reasons. That doesn’t make the question wrong.
What is agentic AI orchestration? In agentic systems, an AI model doesn’t just answer a question once. It plans, executes tool calls, evaluates intermediate outputs, and loops back, sometimes hundreds of times per user request. The CPU manages that loop. The GPU handles the heavy compute inside each step.