Epoch AI Confirms GPT-5.5 Pro at ECI 159, What the Score Resolves and What It Doesn't

April 29, 2026 2 min read Epoch AI Partial Weak

Tech Jacks Solutions AI News Coverage

Epoch AI has independently evaluated GPT-5.5 Pro and confirmed an ECI score of 159, a new record on Epoch's capabilities index, resolving the benchmark uncertainty flagged in coverage from April 27. FrontierMath scores published alongside the ECI result carry an important qualification: it's not yet confirmed whether Epoch ran those evaluations independently or reported scores submitted by OpenAI.

ai-models-news generative-ai epoch-ai gpt-55-pro frontier-benchmarks model-evaluation

ECI 159, Epoch AI independent record, GPT-5.5 Pro

Key Takeaways

Epoch AI independently confirmed GPT-5.5 Pro at ECI 159, a new record on Epoch's capabilities index, verified by independent evaluation
FrontierMath Tier 1-3 (52%) and Tier 4 (40%) scores are reported but carry a qualification: evaluation methodology (independent vs. vendor-submitted) has not been confirmed
Epoch's database now tracks 3,200+ models as of April 27, ECI=159 is a record against that full independently tracked dataset
Verify FrontierMath methodology at the Epoch AI leaderboard before using Tier 4 figures in model selection decisions

Model Release

GPT-5.5 Pro

OrganizationOpenAI

TypeLLM — Flagship

ParametersNot disclosed

BenchmarkECI: 159 (Epoch AI independent) | [SELF-REPORTED] FrontierMath T1-3: 52%, T4: 40%

AvailabilityAPI + $30/month tier

Analysis

ECI=159 is independently verified. The FrontierMath Tier 4 figure (40%) is plausible but its evaluation provenance, independent Epoch run versus vendor-submitted score, has not been confirmed from available sources. Use the Tier 4 number as directional, not definitive, until Epoch's methodology note is reviewed.

GPT-5.5 Pro FrontierMath Scores (qualified, evaluation methodology unconfirmed)

Tiers 1-3 (current)

52%

Tiers 1-3 (prior)

50%

Tier 4 (current)

40%

Tier 4 (prior)

38%

Two days ago, the GPT-5.5 Pro story had a gap.

The super app architecture and $30 pricing tier were reported. The benchmark numbers were pending. Per Epoch AI’s independent evaluation, that gap is now closed for one key metric: GPT-5.5 Pro scores 159 on the ECI, the Epoch Capabilities Index, setting a new record on that measure.

The ECI matters because Epoch AI runs its evaluations independently. It doesn’t take vendor submissions and publish them, it runs models against its own evaluation suite. That’s the distinction that makes an ECI score more reliable than a benchmark table in a model release blog post. ECI=159 is a confirmed, independently generated finding.

The FrontierMath results need to be read differently. According to the source data, GPT-5.5 Pro scored 52% on FrontierMath Tiers 1-3 and 40% on Tier 4, up from 50% and 38% respectively. FrontierMath is an Epoch AI benchmark, but the key question is whether Epoch ran GPT-5.5 Pro through it independently, or whether these numbers reflect OpenAI’s own evaluation submitted for publication. That distinction hasn’t been confirmed in the available source material. The results are being presented here with attribution to the source, “according to the evaluation data”, not as independently verified findings on par with the ECI score.

Why does it matter for practical decision-making? Enterprise buyers use benchmark tables to compare models across vendors. A Tier 4 FrontierMath score of 40% is a notable capability claim, it describes performance on advanced mathematics problems at difficulty levels that have challenged frontier models. But the value of that number in a vendor comparison depends on whether it was produced under controlled third-party conditions. Treat the FrontierMath figures as directionally informative until the evaluation methodology is confirmed.

Epoch AI’s database now tracks more than 3,200 models, per data updated April 27. That context matters for what ECI=159 actually represents: it’s a record against the largest independently tracked frontier model dataset currently available, not a record against a curated vendor selection.

What to watch

Epoch AI typically publishes full evaluation methodology alongside its model assessments. Checking the Epoch AI leaderboard directly will confirm whether FrontierMath scores were independently run, and that verification step is worth doing before using the Tier 4 figure in any model selection decision. The benchmark ceiling debate in AI evaluation isn’t resolved by this result; it’s a data point in an ongoing conversation about whether standard evals can still meaningfully differentiate frontier models.

The ECI record is real and independently verified. The FrontierMath numbers are plausible and consistent with the model’s positioning. The distinction between those two sentences is the difference between a confirmed capability and a vendor narrative that deserves scrutiny.

More coverage of OpenAI

Markets Jun 14

SpaceX S-1 Reveals Starlink's $11.4B Revenue, What It Means for OpenAI and Anthropic IPO...

Markets Deep Dive Jun 13

The SPCX Stress Test: What SpaceX's Debut Tells Investors About the OpenAI and Anthropic...

Regulation Deep Dive Jun 13

From Florida to 42 States: How the State AG Enforcement Wave Is Becoming AI's...

Regulation Jun 13

42 State AGs Reportedly Open Joint Investigation Into OpenAI Weeks Before IPO Filing Becomes...

Markets Jun 13

SPCX Closes at $161 After $135 IPO: What the +19% Debut Signals for OpenAI...

View Source

More Technology intelligence

View all Technology

Deep Dive Available Three Stakeholders, One Safeguard Update: What Developers, Anthropic, and Evaluators Each See...

Gallery

Contacts