Four months. That’s the current measured cost of choosing open-weight AI over the closed-weight frontier.
It’s a specific number drawn from a rigorous methodology. It’s also a floor, not a ceiling, and understanding why matters more for enterprise deployment decisions than the number itself.
What ECI Measures, And Why It’s the Right Tool
Benchmark comparisons in AI have a structural problem. A lab can optimize a model for specific evaluation tasks without improving its general capability. MMLU scores can be inflated through benchmark-adjacent training. HumanEval results can be gamed by leaking test cases into pretraining data. Individual benchmark leadership claims are frequently contested and sometimes actively misleading.
Epoch AI’s Effective Capability Index, developed by Jack Edwards and Luke Emberson and published May 29, 2026, takes a different approach. ECI aggregates performance across multiple capability dimensions simultaneously, reasoning, coding, knowledge retrieval, instruction following, to produce a composite index that’s harder to game than any single benchmark. A model that leads on MMLU but underperforms on multi-step reasoning ends up at a different ECI score than its MMLU headline suggests.
The methodology matters because it’s the reason the 8-point finding (90% CI: 7-11) deserves more weight than any single lab’s benchmark release. Epoch is a source under this hub’s source authority framework, an independent research organization with a track record of quantitative AI benchmarking. Their ECI gap finding isn’t a vendor claim. It’s a measurement.
The Measured Gap, What the Numbers Actually Say
The 8 ECI point gap between open-weight and closed-weight SOTA translates, in Epoch’s framing, to approximately 4 months of closed-lab capability development. That temporal translation requires some unpacking.
It doesn’t mean open-weight models are 4 months “behind” in a linear race. Capability development in AI isn’t uniform, some quarters see large ECI jumps, others plateau. The 4-month equivalence is an average derived from the historical relationship between ECI improvements and calendar time in the closed-lab development trajectory. It’s a useful heuristic, not a precise countdown.
The 90% confidence interval (7-11 ECI points) is the honest part of the finding. A 7-point gap and an 11-point gap have different practical implications for enterprise deployment. At 7 points, the gap may be within acceptable tolerance for many tasks, coding assistance, document summarization, structured data extraction. At 11 points, it starts to matter for reasoning-intensive applications: complex legal analysis, multi-step scientific reasoning, high-stakes decision support. The confidence interval is wide enough that enterprise teams should anchor their deployment decisions to their specific task requirements, not to a single gap figure.
The Publication Opacity Problem, The Finding That Changes the Calculus
The 4-month lag is the measured gap against the published frontier. Epoch’s analysis includes an inferential observation, explicitly labeled as such, that the published frontier isn’t the actual frontier.
Closed labs develop capabilities substantially before releasing them. The most advanced model in Anthropic’s internal development isn’t Claude. The most advanced model in OpenAI’s pipeline isn’t GPT-4o or its successors. The published frontier reflects what labs have chosen to release. The development frontier is ahead of that. By definition, open-weight models, which are publicly released at or near their training state, are always compared against the published frontier, not the development frontier.
What this means for the 4-month figure: it’s likely understated. The true capability lag, measured against closed-lab development rather than closed-lab publication, may be 6 months, 8 months, or more. Epoch doesn’t assign a number to this uncertainty, appropriately, because it’s an inference, not a measurement. But the directional claim is structurally sound: the measured lag is a floor, not a ceiling.
What This Means for Enterprise Deployment Decisions
The open vs. closed deployment question has three dimensions that the ECI gap bears on differently.
Capability sufficiency. For task categories where today’s open-weight SOTA is already sufficient, code completion, document summarization, simple Q&A, the 4-month lag doesn’t change the deployment recommendation. The gap exists, but the floor is already above the task threshold. For tasks requiring frontier reasoning capability, complex multi-step analysis, high-stakes decision support, novel problem-solving in specialized domains, the gap matters. The question isn’t “is open-weight good enough?” in the abstract. It’s “is open-weight good enough for this specific task at this specific quality threshold?”
Governance and control. Open-weight models offer capabilities that closed-weight models structurally can’t: complete data privacy (no API calls to vendor infrastructure), fine-tuning control over model behavior, deployment independence from vendor pricing and availability decisions. EU AI Act compliance for agentic systems is substantially easier when the model is self-hosted, data flows, audit trails, and system boundaries are all within the organization’s control. The governance premium for open-weight deployment is real. The question is whether it’s worth a 4-to-8-month capability haircut on tasks that exceed the current open-weight floor.
Investment signal. For investors in open-weight model companies and infrastructure, Mistral, Together AI, and their ecosystem, the ECI gap is a structural challenge that capital alone can’t close. Open-weight labs don’t have the compute concentration of Anthropic, which just closed $65B at a $965B valuation, or OpenAI, or Google DeepMind. Hyperscaler capital is concentrating in closed-lab infrastructure in ways that compound the ECI gap over development cycles. An open-weight model company competing on capability against a closed lab that spends 10x on compute is running a different race. The 4-month lag is the current scorecard. The trajectory matters more.
The Agentic Security Connection
The ECI gap has a direct implication for the enterprise agentic security market. Enterprises deploying agents on open-weight models face a different governance risk profile than those using closed-weight APIs, not necessarily worse, but structurally different. The enterprise AI governance stack required for open-weight agent deployments must be entirely internal, no vendor-side safety filtering, no model-level content moderation, no API-level monitoring. The 4-month capability lag may actually improve some governance outcomes: older model architectures are better understood and have more mature red-teaming histories. But the governance burden is higher.
The Geordie AI round, covered in ‘s companion brief, reflects this complexity. An agentic governance tool that works with open-weight deployments must operate at the infrastructure layer, not the model API layer, a harder engineering problem and a different product category than tools that wrap closed-weight APIs.
The Forward View
Epoch’s ECI methodology is designed to be updated. The next measurement period will reveal whether the gap is widening, stable, or narrowing. Each data point in that sequence is an investment signal.
Narrowing gap: open-weight development is absorbing compute investment efficiently. Open-source ecosystem investment theses hold. Stable gap: the equilibrium favors current task-specific deployment decisions based on capability sufficiency rather than gap trajectory. Widening gap: closed-lab compute concentration is compounding faster than open-weight development absorbs. The enterprise deployment calculus shifts toward closed-weight APIs for capability-sensitive tasks, and the governance premium for open-weight deployment rises.
Watch Epoch’s next ECI update for that directional signal. If the gap widens by 2 or more ECI points in the following period, that’s the data point that changes the open vs. closed investment thesis at scale, not a pundit’s take, not a vendor’s benchmark claim. A measurement.
That’s the bet enterprise teams and open-source investors are actually managing right now. Epoch gave them the baseline. The next update gives them the trend.