The model didn’t solve a benchmark problem. It reportedly disproved a conjecture that’s been open since 1946.
OpenAI announced on May 20, 2026, that an internal general-purpose reasoning model autonomously produced what the company characterizes as a novel proof disproving Paul Erdős’s planar unit-distance conjecture. The conjecture asks: for n points in the plane, how many pairs can be at the same distance? Erdős conjectured in 1946 that the maximum approaches n1+ε for any ε > 0. According to OpenAI’s research announcement, an associated technical paper under arXiv ID 2605.20695 details the proof. The paper is dated May 20, 2026.
The characterizations of “rigorous” and “novel” are OpenAI’s. The Filter verified that all three source URLs for this item were inaccessible at time of processing, so specific technical details come from OpenAI’s announcement and community-level corroboration, not independent review of the paper itself.
What makes this different from benchmark headlines
The model involved is not a domain-specific mathematical solver. It’s an internal general-purpose reasoning model, the same category of system used for writing, coding, and analysis tasks. OpenAI hasn’t disclosed its name or parameter count. According to the associated arXiv submission, the approach reportedly bridges algebraic number theory, including infinite class field towers, to elementary geometry. Golod-Shafarevich theory is cited as part of the methodology. These are real mathematical constructs. Whether the proof applies them correctly is what the external review is meant to establish.
Disputed Claim
That external review is the story’s other significant detail. Fields medalist Timothy Gowers and mathematician Will Sawin are cited by OpenAI as having vetted or co-authored the result. Both are real, prominent researchers. Gowers won the Fields Medal in 1998; Sawin has an established record in combinatorics. Their specific roles, whether they co-authored, reviewed, or contributed independent components, aren’t confirmed from accessible sources. The attribution is to OpenAI’s announcement. According to the arXiv paper, the proof reportedly establishes a lower bound with δ = 0.014 in a simplified argument by Sawin. That figure couldn’t be independently confirmed.
The part nobody mentions
Proofs need to be checked, and checking an AI-generated proof at this level of complexity isn’t fast. The external review structure OpenAI describes matters precisely because self-reported AI math results have a history of looking correct until they don’t. Independent verification by named mathematicians, if that’s what happened here, is a meaningful signal. It’s also what the math community will scrutinize first.
This result sits in a 30-day streak of AI mathematical reasoning milestones. Google DeepMind reported a WorldReasonBench result on May 17; the hub covered GPT-5.5’s reasoning capability profile earlier this month. The Erdős result is a different category of claim: not a benchmark score, but a first-in-kind mathematical output requiring independent expert validation.
What to watch
The math community’s response to arXiv:2605.20695 will determine how this story ages. If Gowers and Sawin publicly confirm their roles and endorse the proof, that’s the independent signal practitioners should wait for. If the paper receives formal peer review acceptance in a top venue, that upgrades the significance further. Watch for OpenAI’s specific characterization of how external review was structured, co-authorship and post-hoc review are meaningfully different claims.
What to Watch
Don’t expect this to translate into production capability anytime soon. The model that produced this result is internal, unnamed, and not available via API. The capability demonstrated is also highly specific: sustained mathematical reasoning on an open conjecture is not the same as general research automation. The gap between this result and “use AI to run your R&D pipeline” is substantial.
TJS synthesis
This result, if it holds, isn’t primarily a story about mathematical AI. It’s a data point in a capability argument: that general-purpose reasoning models can produce outputs requiring original insight, not just pattern completion. That changes the frame on what enterprise research automation might eventually look like. Wait for independent community confirmation, specifically, public statements from Gowers, Sawin, or peer review, before treating this as established fact. The announcement is credible. The proof still needs to be checked.