Nine vs. One: How Google DeepMind's and OpenAI's Math AI Achievements Actually Differ, and What It Means for...

May 27, 2026 6 min read The Rundown AI Qualified Moderate

Tech Jacks Solutions AI News Coverage

The "9-to-1" framing of Google DeepMind's and OpenAI's Erdős achievements is a useful headline ratio and a misleading comparison. The two programs use different architectures, target different problem types, and operate at different verification standards. Understanding the distinction matters more than the score, particularly for research teams evaluating which mathematical AI approach applies to their work.

ai-models-news generative-ai-news ai-math-reasoning google-deepmind alphaproof openai erdos research-automation

AI math milestones, 30 days, 4+

Key Takeaways

DeepMind's and OpenAI's Erdős achievements use different architectures and target different problem types, a raw count comparison obscures more than it reveals
DeepMind's formal-verification-plus-RL approach produces mechanically checkable proofs in Lean; OpenAI's reasoning model approach is more flexible but requires external verification
The nine-problem claim remains single-source (T3); the arXiv paper will be the appropriate trigger for evaluating architecture and problem selection
Research teams should choose between these approaches based on formalization requirements, not headline problem counts, and neither approach is accessible to most teams without significant formal methods expertise today

DeepMind AlphaProof vs. OpenAI Reasoning, Key Dimensions

Reported problems solved

DeepMind: 9 (T3, unconfirmed) / OpenAI: 1 (confirmed)

Verification method

DeepMind: Formal (Lean) / OpenAI: Independent expert review

Architecture

DeepMind: RL + formal verification loop / OpenAI: Reasoning model (exploratory)

Problem type (May 2026)

DeepMind: Erdős conjectures (difficulty unconfirmed) / OpenAI: Unit-Distance (frontier)

arXiv paper

DeepMind: Pending / OpenAI: Documented in prior coverage

Nine versus one is the kind of number that travels. The Rundown AI’s reporting framed Google DeepMind’s AlphaProof achievement against OpenAI’s as a competition score, and the ratio is memorable. The problem is that it compares two genuinely different things, different problem types, different verification environments, different architectural approaches, as if they were competing on the same track. They’re not.

This deep-dive doesn’t resolve the primary-source gap on the DeepMind claim. “AlphaProof Nexus” as a system name and nine as the specific problem count remain single-source claims from a T3 newsletter publication, pending a primary DeepMind announcement. What this deep-dive does is place both programs in a framework that lets research teams evaluate what each has actually demonstrated, and what those demonstrations do or don’t mean for practical research applications.

What Each Lab Has Actually Demonstrated

OpenAI’s achievement is the better-documented one. Its reasoning model autonomously disproved Erdős’s 1946 Unit-Distance conjecture, a specific, named, independently verifiable result. The verified Erdős proof analysis from May 25 established what this means in formal terms: the model generated a disproof in a formal language environment, and that disproof was independently checked. That’s the gold standard for mathematical AI claims. One problem. Confirmed architecture. Verifiable output.

Google DeepMind’s AlphaProof program has been public for longer. The AlphaProof architecture, reinforcement learning combined with formal verification in a loop, has been described in published research. The approach works by searching for proofs in a formal language (Lean, primarily), verifying each step mechanically, and using RL to guide the search more efficiently than brute force. That methodology isn’t in question. What The Rundown AI is reporting is that a system within this program, reportedly called AlphaProof Nexus, has solved nine open Erdős problems. Specific system name and problem count require a primary DeepMind source to confirm.

Assuming the reporting is directionally accurate, the count isn’t what matters most. The problem selection is.

Problem Type Is the Key Variable

Erdős posed thousands of conjectures across his career, ranging from elementary combinatorics accessible to undergraduates to deep number theory problems that have resisted decades of expert effort. Nine solved problems could mean nine at the accessible end of that spectrum, or nine at the frontier. The difference in what that proves about the system’s capability is enormous.

OpenAI’s Unit-Distance disproof is clearly at the frontier end, Erdős’s 1946 conjecture was a longstanding open problem in combinatorial geometry. That one result tells us more about the model’s capability than nine results of varying difficulty would tell us without knowing the difficulty distribution.

This isn’t a knock on the DeepMind claim. It’s the question the arXiv paper, when published, will answer. Research teams should hold the nine-problem figure at face value, potentially significant, but not interpretable until the problem selection is documented.

The Architectural Comparison

The two programs are structurally different in ways that affect where each is useful.

Unanswered Questions

Which specific nine Erdős problems were solved, and at what difficulty tier within the Erdős conjecture set?
What compute resources does AlphaProof Nexus require per problem, and how does that compare to formal-verification-based systems at similar capability?
Does DeepMind's approach generalize beyond Lean-expressible problems, or is the formalization bottleneck a hard constraint?

DeepMind’s AlphaProof approach generates and verifies proofs in formal languages. The formal verification step is what makes results mechanically trustworthy, a proof that Lean accepts is, by construction, correct within the axiom system. The limitation is that formal language environments are constrained. Problems have to be expressible in Lean (or equivalent) before the system can work on them. Most research problems, even mathematical ones, aren’t already posed in that form.

OpenAI’s approach with its reasoning models is less formalized but more flexible. The Unit-Distance disproof was generated and then independently verified, but the generation step didn’t require a formal language environment. That makes it potentially applicable to a wider class of problems, at the cost of requiring external verification of the output.

Neither approach is better in the abstract. They serve different use cases. DeepMind’s is better suited to problems where formal verification is the goal and the problem is already expressible in a proof assistant. OpenAI’s is better suited to exploratory research where generating candidate solutions for human expert review is the bottleneck.

The Four-Month Pattern

Four months of registry coverage tells a consistent story. The May 25 Erdős proof analysis, the FrontierMath Tier 4 coverage, and the “Four Reasoning Breakthroughs in 30 Days” brief from May 21 all point to the same trajectory: mathematical AI performance is advancing faster than most research organizations expected at the start of 2026.

The pattern across both labs is consistent: each milestone builds on a formal or semi-formal verification methodology, each uses RL or equivalent optimization over a search space, and each produces results that are independently checkable rather than just reported. That’s meaningful. The mathematical AI results that are holding up to scrutiny are the ones with mechanically verifiable outputs, not the ones that rely solely on benchmark performance.

Research teams should weight that pattern more heavily than any individual score. The question isn’t whether DeepMind solved nine or eleven or four Erdős problems this month. The question is whether the methodology underlying these results, formal verification plus RL optimization, is converging on something general enough to address the problems your research team actually works on.

Practical Implications for Research Teams

The practical constraint nobody is addressing in these announcements: tool accessibility. Both AlphaProof and OpenAI’s reasoning models operate at the frontier of what current AI can do mathematically. Getting meaningful value from them requires problems that are already formally stated or can be formalized, compute resources to run the models at sufficient scale, and expertise to interpret and verify the outputs. Most research teams don’t have all three.

The near-term usefulness of these results is in trajectory, not direct application. What DeepMind and OpenAI are demonstrating is that the formal-verification-plus-RL approach works at a scale that was previously unachievable. The tools that will bring that capability to broader research teams, with lower formalization requirements and lower compute costs, are 12 to 18 months away if the current trajectory holds.

What to Watch

AlphaProof Nexus arXiv paper published, check math.CO and cs.AI categories

Primary DeepMind blog post or announcement confirming system name and problem count

Independent evaluation of either system (Epoch AI or equivalent) on mathematical reasoning benchmarks

What to Watch

Two specific triggers matter for research teams tracking this space.

The arXiv paper for AlphaProof Nexus, reportedly pending publication, will document problem selection, architecture specifics, and compute requirements. That’s the moment to evaluate whether the nine-problem claim tells you something useful about the system’s frontier capability or reflects a selection of more tractable problems. Watch the arXiv math.CO and cs.AI categories.

The second trigger is any primary DeepMind announcement confirming the “AlphaProof Nexus” system name and the nine-problem count. If the T3 reporting is accurate, DeepMind will publish something. If they don’t, the single-source claim needs to be downgraded further. The absence of a primary announcement within a reasonable window is itself informative.

TJS Synthesis

The 9-to-1 comparison is real in the limited sense that one T3 publication is reporting it. It’s not real in the sense of being a verified competitive score. What is real is that both Google DeepMind and OpenAI have now produced mathematical AI results that meet a meaningful bar, formal or independently verifiable outputs on Erdős-class problems. That’s the story.

Research teams deciding which mathematical AI approach to invest in should focus on the architectural question, not the score: does your research problem require formal verification (DeepMind’s strength) or exploratory candidate generation for expert review (OpenAI’s more flexible approach)? The answer to that question is more durable than any problem count. Wait for the arXiv paper before updating your research stack. Read it for problem selection and compute requirements, not just for the headline number.