Three CLI-native coding agents in roughly 30 days. That’s the market xAI just entered.
Grok Build launched in early access during May 2026, adding a third terminal-first option to a developer stack that already includes Anthropic’s Claude Code and OpenAI’s recently GA’d Codex. The product is real and accessible: the CLI installer resolves at x.ai/cli/install.sh, and independent platforms including Vercel AI Gateway and Puter Developer have each documented the model’s availability under the identifier grok-build-0.1. That’s the confirmed floor. Everything beyond it carries a qualifier.
xAI describes Grok Build as a terminal-based coding agent that proposes changes via code diffs, the kind of loop where the tool reads your repo, reasons about the task, and surfaces changes for review rather than just autocompleting a line. The company states the tool supports AGENTS.md conventions, MCP servers, plugins, and hooks. AGENTS.md is a real and established industry convention used by multiple coding agents; Grok Build’s specific implementation of it comes from xAI’s own documentation, which isn’t independently verified yet. MCP support is plausible given xAI’s May 24 connector additions to Vercel, Canva, and S&P Global, but “plausible” isn’t confirmed.
Unanswered Questions
- What does the agentic loop latency look like at production-scale repo sizes?
- Do the reported $1/$2 per million token rates hold under volume discounting or enterprise tiers?
- How does AGENTS.md integration compare to Claude Code's implementation in practice?
The catch is the numbers. xAI reports a 256,000-token context window with no stated output limit, and API pricing at $1 per million input tokens and $2 per million output tokens. The CLI is reported to be included for SuperGrok and X Premium Plus subscribers. None of these figures have been independently confirmed from a live source, the primary blog post and API documentation were inaccessible at time of verification. They may be accurate. They’re also the vendor’s numbers, and you should treat them as such until independent evaluation arrives.
No benchmark data has been released. Epoch AI evaluation is pending. There’s no arXiv paper. Right now, “purpose-built for interactive coding agents, tool use, and multi-step development” is the characterization that’s independently documented – from Vercel’s platform notes and Puter’s developer documentation. That’s a real signal, but it’s not a performance claim.
For the team evaluating whether to add Grok Build to their agentic coding stack: the access model is the clearest differentiator at this stage. Claude Code operates inside Anthropic’s ecosystem. Codex reached general availability tied to OpenAI’s API. Grok Build arrives with a subscription-tier CLI path that could lower the barrier for individual developers already paying for X Premium Plus, if that pricing holds. The per-token API economics, at reported rates, put it closer to mid-tier positioning than premium.
What to Watch
What to watch
independent benchmark evaluations. SWE-bench, HumanEval, or any LMSYS-style head-to-head that includes grok-build-0.1 will be the actual signal. Until then, the product exists, the installer works, and the specs are pending verification. Don’t rebuild your pipeline around vendor-reported numbers.
The part nobody mentions with early access launches is that “early access” means the capability bar is set by whoever got in first, and their use cases may not match yours. Wait for independent benchmarks before putting Grok Build in a production agentic loop. The access cost is low enough to experiment; the evaluation data isn’t there yet to commit.