2 Million AI Agents, No Collective Intelligence: What the Superminds Test Found

April 28, 2026 2 min read arXiv preprint (Superminds Test / MoltBook, ID unconfirmed; verify before publish) Partial Moderate

Tech Jacks Solutions AI News Coverage

A new research framework called the Superminds Test put a society of reportedly 2 million AI agents through complex reasoning tasks and found they failed to outperform a single frontier model. The finding arrives in a week of heavy infrastructure investment in multi-agent systems, and challenges a core assumption behind that investment.

agentic-ai multi-agent-ai ai-benchmarks ai-evaluation collective- intelligence ai-agents-news agentic-ai-news

Reportedly 2M agents, zero collective intelligence gain

Key Takeaways

A new framework (Superminds Test) evaluated collective intelligence in a reportedly 2-million-agent society (MoltBook) and found a "stark absence" of collective intelligence, reported findings from an unconfirmed arXiv preprint
Agent societies failed specifically on joint reasoning and information synthesis tasks despite high individual model capability
The finding challenges a core assumption behind multi-agent infrastructure investment: that collective intelligence emerges from sufficient agent scale
The paper is a pre-peer-review arXiv preprint; all findings are preliminary pending independent review

A research paper submitted to arXiv introduces the Superminds Test, a framework for evaluating collective intelligence in large-scale autonomous agent societies. The paper’s central finding: a society of agents described as “MoltBook” – reportedly comprising 2 million agents, showed what the authors characterize as a “stark absence” of collective intelligence on complex reasoning and information synthesis tasks.

Note on sourcing: The Wire provided arXiv ID 2604.The correct arXiv ID has not been confirmed at publication time. The paper is described as a preprint submitted to arXivAll specifics from the paper should be treated as reported findings from an unconfirmed preprint pending that confirmation.

With that caveat in place: the finding is counterintuitive relative to current investment patterns. The agentic AI infrastructure stack has seen five framework or standard releases in ten days. Cloudflare, OpenAI, and others have committed significant engineering resources to making large-scale agent societies operationally possible. The implicit assumption behind much of that investment is that more agents produce better outcomes, that collective intelligence emerges from sufficient individual capability and coordination infrastructure.

The Superminds Test result challenges that assumption directly. According to the paper, individual model power does not automatically translate to collective reasoning capability. The MoltBook society reportedly failed specifically on joint reasoning and information synthesis, the tasks where collective intelligence would be most expected to outperform individual baselines.

The “stark absence” framing is worth taking seriously. This isn’t a finding that collective intelligence was present but underwhelming. It’s a finding that the expected emergent property didn’t emerge at the scale tested.

For practitioners building multi-agent systems, this matters at the architecture level. If collective intelligence doesn’t reliably emerge from large agent societies on complex reasoning tasks, the design question isn’t just “how do I coordinate this many agents?” It’s “what task decomposition actually benefits from multiple agents, and what tasks are better handled by a single frontier model?” Those are different architecture decisions with different infrastructure requirements and cost profiles.

The paper is an arXiv preprint and has not yet undergone peer review. That’s standard for this kind of research and doesn’t make the findings less relevant – but it means independent replication and expert critique haven’t yet run their course. The Superminds Test framework itself may prove influential regardless of whether the specific MoltBook findings hold under scrutiny, because the field lacks standardized evaluation methods for collective intelligence in agent systems.

Prior coverage of AI evaluation frameworks on this hub has tracked the broader pattern: standard evaluation approaches are struggling to keep up with the capabilities they’re meant to assess. The Superminds Test adds a collective intelligence dimension to that gap.