Four frontier models. Four points separating them. That’s the picture from Epoch AI’s Capabilities Index as of April 24, 2026.
According to Epoch AI’s published benchmark documentation, the Epoch Capabilities Index combines multiple evaluations onto a single scale to enable direct model comparisons. The latest rankings place GPT-5.4 Pro at 158, Gemini 3.1 Pro at 157, Claude Opus 4.7 at 156, and Qwen3.5-Omni at 155. Claude Opus 4.7’s score of 156 has been confirmed via an Epoch AI post; the remaining scores carry qualified status pending resolution of the primary index URL.
A four-point spread across four flagship models isn’t noise. It’s a structural shift.
Stanford’s 2026 AI Index puts it plainly: the US-China AI model performance gap has “effectively closed,” with models from both countries trading the top position on performance benchmarks multiple times. Crucially, Stanford frames this in terms of benchmark percentages, not time-lag. An earlier, unverified figure of “four months” circulated in trade coverage this week but does not have primary source support and does not appear in this brief.
Why this matters for enterprise buyers. When four models sit within a rounding error of each other on a combined benchmark, the selection decision stops being about raw capability. It becomes about pricing, latency, context window reliability, compliance posture, and support infrastructure. Enterprise teams that locked in model strategy based on assumed US dominance need to revisit those assumptions, not because the rankings have reversed, but because parity is real enough that secondary factors now drive the outcome.
Why this matters for the investment thesis. US AI development has attracted investment at a scale that dwarfs Chinese AI spending. The specific funding ratio cited in some reports this week is not confirmed at primary source level, but the directional imbalance is not in dispute. The convergence data raises a pointed question for investors: if benchmark parity has arrived at a fraction of the US spend, what exactly does continued capital concentration buy?
The benchmark caveat enterprise teams can’t skip. The ECI measures performance on a curated benchmark set. It doesn’t measure enterprise deployment reliability, safety record, regulatory compliance, or the speed of vendor response when something breaks. Benchmark parity at the frontier is meaningful. It’s not the whole picture. A Chinese model scoring 155 and a US model scoring 158 are close on benchmarks. They’re not identical products.
GPT-5.4 Pro is available via OpenAI’s Responses API, confirmed by OpenAI’s developer documentation. Gemini 3.1 Pro, Claude Opus 4.7, and Qwen3.5-Omni are available through their respective developer platforms.
What to watch. The Epoch AI Capabilities Index URL for the primary rankings page was inaccessible at publication time. When it resolves, the specific scores for GPT-5.4 Pro and Gemini 3.1 Pro move from qualified to confirmed status, or the numbers shift. Watch for arXiv technical papers from OpenAI and Google on these models; neither has published one yet. Stanford’s next AI Index update will be the most authoritative source for tracking whether convergence holds or one camp pulls ahead.
The era of a clear US frontier lead may already be over. Enterprise teams that treat model selection as a solved problem are working from outdated assumptions.