GPT-5.5 Inside Codex: The NVIDIA GB200 Infrastructure Stack Powering OpenAI's Agentic Workflows

April 24, 2026 2 min read OpenAI GPT-5.5 System Card Partial

OpenAI's GPT-5.5 serves as the model backbone for its Codex agentic coding application, running on NVIDIA GB200 NVL72 infrastructure. This follow-up to the launch announcement examines what the infrastructure and integration specifics mean for developers building on or evaluating the agentic stack.

The GPT-5.5 launch brief covered the model. This one covers what runs beneath it.

OpenAI’s GPT-5.5 System Card describes the model as “designed for complex, real-world work, including writing code, researching online, analyzing information.” That framing isn’t incidental. GPT-5.5 is the engine powering Codex, OpenAI’s agentic coding application, and the infrastructure stack OpenAI chose for that deployment tells practitioners something specific about how the company is building the agentic layer.

The hardware choice: NVIDIA GB200 NVL72. OpenAI and NVIDIA announced Day-Zero support, with GPT-5.5 reported as optimized for TensorRT-LLM and vLLM inference frameworks, per the companies’ announcements. That claim still requires human verification of NVIDIA’s direct GPT-5.5-specific announcement, the cross-reference at verification stage found general NVIDIA developer documentation rather than a GPT-5.5-specific source. Treat the framework optimization claim as company-announced, not independently confirmed.

What the stack signals: GB200 NVL72 is a GPU-dense, high-bandwidth configuration built for frontier model inference at scale. Choosing it for Codex, an application that needs fast, multi-step code reasoning, is consistent with OpenAI’s closed-stack philosophy. High-performance hardware, a proprietary model, a subscription deployment tier. The developer proposition is capability-first, with cost and openness as secondary concerns.

OpenAI states GPT-5.5 matches GPT-5.4 per-token latency while delivering improved reasoning performance, per the company’s release. Secondary reporting from Decrypt describes it as matching GPT-5.4’s speed “while outperforming it on nearly every benchmark,” and DataCamp notes “efficiency gains, stronger long-context reasoning.” Neither source constitutes an independent technical benchmark, these are corroborating accounts of OpenAI’s own claims. OpenAI reports internal SWE-bench improvement; independent evaluation is pending.

The Pro tier is reported to offer a 1M-token context window, per release coverage. That figure hasn’t been confirmed via the System Card or an independently verifiable technical source, use it as context, not specification.

Analyst firm Constellation Research characterized the model as capable of navigating “messy, multi-part tasks” autonomously. That’s an analyst interpretation, not a verified capability statement.

For developers evaluating the agentic stack, the practical question isn’t the benchmark headline. It’s the deployment model. Codex runs on subscription access to a closed, hardware-optimized frontier model. That’s a meaningful architectural commitment: high capability ceiling, limited infrastructure flexibility, cost structure tied to OpenAI’s pricing decisions. Compare that to the open-weight alternative examined in the DeepSeek-V4 brief, or to Meta’s CPU-first approach for persistent agents, and the tradeoff space becomes clearer.

What to watch: NVIDIA’s direct GPT-5.5 announcement, if it materializes, will confirm or clarify the TensorRT-LLM and vLLM optimization claims. Epoch AI’s independent evaluation, currently pending, will be the first real signal on whether the internal SWE-bench claims hold under external scrutiny. Enterprise adoption of the Codex application over the next 60 days will also surface latency and cost data that no benchmark currently provides.

The infrastructure choice isn’t just a hardware decision. It’s a statement about who OpenAI is building the agentic layer for.

View Source

More Technology intelligence

View all Technology