Inference-time compute just got a price tag.
Sakana AI launched Marlin on June 15, 2026, a B2B autonomous research agent that doesn’t just generate answers. It reasons through a problem across hundreds to thousands of iterative queries, running for up to eight hours before delivering a final output. Sakana AI states the system produces reports of 60 to 100 pages citing 60 to 80 sources, according to the company’s product description. No independent benchmark data exists yet.
The backbone is AB-MCTS, Adaptive Branching Monte Carlo Tree Search, a methodology Sakana AI developed and published in peer-reviewed research. According to the underlying arXiv paper, AB-MCTS lets a model evaluate competing reasoning branches simultaneously rather than committing to a single chain of thought. The research foundation is independently verifiable. The product implementation is vendor-asserted, those are different things, and the distinction matters for anyone evaluating Marlin seriously.
According to MarkTechPost’s coverage, Marlin orchestrates sub-tasks across multiple frontier models including OpenAI’s o4-mini, Google’s Gemini 2.5 Pro, and DeepSeek’s R1-0528, though this specific model configuration couldn’t be independently confirmed. Sakana AI has positioned the product for corporate planning functions, financial institutions, and think tanks, per VentureBeat’s launch coverage.
Disputed Claim
The part nobody mentions: multi-LLM orchestration at this scale means your organization’s strategic inputs, competitive analysis requests, M&A scenario planning, policy research briefs, are being processed across at least three separate model providers simultaneously. Data governance teams need to know that before procurement signs anything.
Sakana AI has published a pay-per-use pricing model starting at approximately ¥9,800 (roughly $62 USD at current exchange rates) per study, with subscription tiers at approximately ¥150,000/month (~$950 USD) and ¥400,000/month (~$2,550 USD). According to Sakana AI, approximately 300 industry professionals participated in a closed beta in April 2026.
Don’t expect this to replace your research function. Eight hours of compute time producing a 100-page document is a different workflow than most enterprise teams are equipped to review, integrate, or act on quickly. The output format assumes a reader who wants comprehensive depth, not the executive who needs a decision in 20 minutes.
What Marlin actually represents is more important than the product itself. This is the fifth long-horizon agentic product to reach market in two weeks, joining xAI’s Grok Build Dashboard, Databricks’ Omnigent, and others. The inference-time compute research that frontier labs were publishing as academic work twelve months ago is now a B2B pricing model. That’s a faster diffusion curve than most practitioners anticipated.
Unanswered Questions
- Which data governance frameworks cover multi-hour autonomous reasoning sessions across multiple model providers?
- What output review protocols are required before an 8-hour AI-generated strategy report reaches a decision-maker?
- How does vendor liability work when the orchestrating system (Marlin) delegates sub-tasks to third-party frontier models?
The governance question hasn’t caught up. Frameworks built for real-time, human-supervised AI interactions weren’t designed for eight-hour autonomous loops where the model independently decides which sources to query, which reasoning branches to pursue, and what to include in a deliverable. Enterprise buyers need to assess that gap before deployment, not after the first report lands on a board desk.
TJS synthesis:
Don’t wait for Marlin to prove itself. The correct posture right now is to map whether your organization has governance frameworks that cover extended autonomous AI reasoning sessions: data input boundaries, output review protocols, and human-in-the-loop checkpoints for multi-hour runs. If those frameworks don’t exist, build them before the next generation of these products arrives, because the next one will be faster, cheaper, and more capable of passing a cursory review without scrutiny.