Agentic AI News: Sakana AI's Marlin Runs for Eight Hours and Delivers 100-Page Strategy Reports

June 16, 2026 3 min read Sakana AI Partial Weak

Tech Jacks Solutions AI News Coverage

Sakana AI has launched Marlin, a B2B autonomous research agent that runs reasoning loops for up to eight hours and produces strategy reports of 60 to 100 pages, the first commercial product built on the company's published AB-MCTS inference-time scaling research. The launch marks the first time a non-frontier-lab organization has taken inference-time compute methodology from peer-reviewed research into a sold enterprise product.

agentic-ai inference-time-compute ab-mcts sakana-ai enterprise-ai b2b-ai long-horizon-agents ai-governance

Autonomous reasoning window, up to 8 hours

Key Takeaways

Sakana AI launched Marlin on June 15, the first commercial B2B product built on its published AB-MCTS inference-time scaling methodology, independently corroborated via arXiv.
Marlin runs autonomous reasoning loops for up to eight hours and produces 60–100 page reports, according to Sakana AI, no independent benchmark evaluation exists yet.
Pricing starts at approximately ¥9,800 (~$62 USD) per study; subscription tiers reach ¥400,000/month (~$2,550 USD), per Sakana AI's published model.
This is the fifth long-horizon agentic product to reach market in two weeks, the inference-time compute research-to-product cycle is accelerating faster than most practitioners anticipated.

Model Release

Sakana Marlin

OrganizationSakana AI

TypeAgentic AI / Security

ParametersNot disclosed

BenchmarkNot disclosed

AvailabilityB2B, pay-per-study and monthly subscription tiers

Verification

Partial Vendor announcement + arXiv (AB-MCTS research methodology only) Product capabilities, pricing, and model orchestration details are vendor-stated via broken primary sources. AB-MCTS research foundation independently corroborated. No independent benchmark evaluation available.

Inference-time compute just got a price tag.

Sakana AI launched Marlin on June 15, 2026, a B2B autonomous research agent that doesn’t just generate answers. It reasons through a problem across hundreds to thousands of iterative queries, running for up to eight hours before delivering a final output. Sakana AI states the system produces reports of 60 to 100 pages citing 60 to 80 sources, according to the company’s product description. No independent benchmark data exists yet.

The backbone is AB-MCTS, Adaptive Branching Monte Carlo Tree Search, a methodology Sakana AI developed and published in peer-reviewed research. According to the underlying arXiv paper, AB-MCTS lets a model evaluate competing reasoning branches simultaneously rather than committing to a single chain of thought. The research foundation is independently verifiable. The product implementation is vendor-asserted, those are different things, and the distinction matters for anyone evaluating Marlin seriously.

According to MarkTechPost’s coverage, Marlin orchestrates sub-tasks across multiple frontier models including OpenAI’s o4-mini, Google’s Gemini 2.5 Pro, and DeepSeek’s R1-0528, though this specific model configuration couldn’t be independently confirmed. Sakana AI has positioned the product for corporate planning functions, financial institutions, and think tanks, per VentureBeat’s launch coverage.

Disputed Claim

Marlin orchestrates reasoning across o4-mini, Gemini 2.5 Pro, and DeepSeek R1-0528

Specific model configuration sourced via MarkTechPost (T4) only, not independently confirmed

Verify multi-LLM configuration directly with Sakana AI before building data governance assumptions around specific providers

The part nobody mentions: multi-LLM orchestration at this scale means your organization’s strategic inputs, competitive analysis requests, M&A scenario planning, policy research briefs, are being processed across at least three separate model providers simultaneously. Data governance teams need to know that before procurement signs anything.

Sakana AI has published a pay-per-use pricing model starting at approximately ¥9,800 (roughly $62 USD at current exchange rates) per study, with subscription tiers at approximately ¥150,000/month (~$950 USD) and ¥400,000/month (~$2,550 USD). According to Sakana AI, approximately 300 industry professionals participated in a closed beta in April 2026.

Don’t expect this to replace your research function. Eight hours of compute time producing a 100-page document is a different workflow than most enterprise teams are equipped to review, integrate, or act on quickly. The output format assumes a reader who wants comprehensive depth, not the executive who needs a decision in 20 minutes.

What Marlin actually represents is more important than the product itself. This is the fifth long-horizon agentic product to reach market in two weeks, joining xAI’s Grok Build Dashboard, Databricks’ Omnigent, and others. The inference-time compute research that frontier labs were publishing as academic work twelve months ago is now a B2B pricing model. That’s a faster diffusion curve than most practitioners anticipated.

Unanswered Questions

Which data governance frameworks cover multi-hour autonomous reasoning sessions across multiple model providers?
What output review protocols are required before an 8-hour AI-generated strategy report reaches a decision-maker?
How does vendor liability work when the orchestrating system (Marlin) delegates sub-tasks to third-party frontier models?

The governance question hasn’t caught up. Frameworks built for real-time, human-supervised AI interactions weren’t designed for eight-hour autonomous loops where the model independently decides which sources to query, which reasoning branches to pursue, and what to include in a deliverable. Enterprise buyers need to assess that gap before deployment, not after the first report lands on a board desk.

TJS synthesis:

Don’t wait for Marlin to prove itself. The correct posture right now is to map whether your organization has governance frameworks that cover extended autonomous AI reasoning sessions: data input boundaries, output review protocols, and human-in-the-loop checkpoints for multi-hour runs. If those frameworks don’t exist, build them before the next generation of these products arrives, because the next one will be faster, cheaper, and more capable of passing a cursory review without scrutiny.