Cost compression has been the real story in the coding agent market for months. Composer 2.5 is the sharpest data point yet.
Per Artificial Analysis benchmarks published May 21, Cursor Composer 2.5 reportedly scores 62 on the Coding Agent Index, up from approximately 48 in the prior version. That reportedly places it third on the index, behind Claude Opus 4.7 max at 66 and GPT-5.5 xhigh reasoning at 65. The spread between third and first is four points. The cost spread is a different story.
Artificial Analysis reports task costs at $0.07 in Standard mode and $0.44 in Fast mode. Compare that to $4.10 for Opus 4.7 and $4.82 for GPT-5.5. At Fast mode pricing, Composer 2.5 runs at roughly a 10x discount versus Opus 4.7. At Standard mode, it’s closer to 60x cheaper. For a development team running thousands of tasks per week, that math changes a procurement decision.
The benchmark picture has a second dimension. Per the same Artificial Analysis evaluation, Composer 2.5 reportedly gained 35 points on SWE-Bench-Pro-Hard-AA, moving from 12% to 47%. The SWE-Bench variants test real software engineering tasks: bug fixing, feature implementation, code understanding. A 35-point gain on the harder variant is a meaningful stated improvement. It reportedly matches Claude Opus 4.7 max on that specific benchmark.
The catch is the verification layer. All numerical claims here come from Artificial Analysis, an independent benchmark provider, but these figures are attributed to independent evaluation, not vendor self-reporting, which matters for how much weight you give them. They remain reported, not independently confirmed at this production stage.
One number that’s not in this brief: average execution wall time. The Wire referenced a 6.7-minute figure for Fast mode, but without a source URL, it doesn’t clear the bar for publication. Artificial Analysis’s full report should address it.
Context: this release lands inside a competitive moment for agentic coding tools. We covered the broader picture on May 21, what Google, Anthropic, and OpenAI actually built in their agentic coding platforms. Composer 2.5 reshapes that picture by inserting a challenger within four benchmark points of the top two, at a cost structure the major labs haven’t matched.
The longer pricing pattern: a brief from May 13 argued that when frontier model benchmarks converge, pricing becomes the story. Composer 2.5 is the clearest evidence for that thesis to date. The Coding Agent Index scores for the top three tools sit within a four-point band. The cost spread is an order of magnitude.
What to Watch
What to watch
the Artificial Analysis source is the primary verification step. SWE-Bench-Pro-Hard-AA is a recognized variant in the SWE-Bench family, the methodology details matter for evaluating the 47% claim. Watch whether Anthropic or OpenAI respond with pricing adjustments; a four-point benchmark gap doesn’t justify a 10x premium indefinitely.
TJS synthesis
The price-performance narrative is compelling, and the benchmark source has independent credibility. The decision framework is straightforward: if your team’s Coding Agent Index use cases skew toward the task types SWE-Bench-Pro-Hard-AA covers, Composer 2.5 deserves an evaluation. At $0.44 per task, the evaluation cost is trivial.