What Is Grok 4.3? xAI's Flagship Model, Tiers, and the Controversies (2026)
Last verified: June 9, 2026 · Format: Breakdown
xAI ships models faster than almost anyone can write about them. Grok 4.3 Beta arrived on April 17, 2026, the same day xAI announced it was being acquired by SpaceX. By the time you read this, a Grok 4.4 or 4.5 may already be in the wild, and Grok 5 is reportedly in training on a 10-trillion-parameter target. That pace is the single most important thing to understand about Grok: the version number matters, the marketing does not, and most of the benchmark numbers you will see online describe an older model than the one you are paying for.
This breakdown covers what Grok 4.3 actually is, the full model lineup behind it, what each tier costs, where it performs well, and the safety and bias controversies that any honest assessment has to include. We have flagged xAI-reported figures separately from independent ones throughout, because the gap between the two is often the story.
What Is Grok 4.3
Grok 4.3 Beta is the current flagship large language model from xAI, Elon Musk's AI company. It is the model that the recommended API alias points to and the default that most paying users now hit. It launched on April 17, 2026 with a December 2025 knowledge cutoff, an improved architecture, native video input, document generation for PDFs, spreadsheets and slides, and better tool-calling than the versions before it.
Grok as a product line is built around one structural advantage no competitor can copy: native, real-time access to the X platform (formerly Twitter), which xAI acquired in March 2025. When you ask Grok about something that happened in the last hour, it draws from the live X firehose rather than from stale indexed pages. For breaking-news and social-sentiment work, that is a genuine differentiator. For everything else, it is one model among several frontier options, and not always the strongest.
The "Beta" label is not decoration. xAI ships fast and labels aggressively, and Grok 4.3 sits in a lineup where four named versions coexist and a fifth is in training. Understanding which version you are actually using, and which version a given statistic refers to, is the first practical skill for evaluating Grok. For where it fits in the wider field, see the AI Tools Hub and the Grok sub-hub.
The Model Lineup
Grok is not a single model. It is a family, and xAI keeps several versions live at once with overlapping capabilities and different price points. The headline flagship is Grok 4.3, but the cheapest and the largest-context options sit on different version numbers. Here is the current state of the lineup as of June 2026.
| Model | Released | Context | Position |
|---|---|---|---|
| Grok 4.3 Beta | Apr 2026 | 1M | Current flagship, recommended alias |
| Grok 4.20 Beta | Feb 2026 | 2M | Low-hallucination focus |
| Grok 4.1 / 4.1 Fast | Nov 2025 | 2M | Cheapest western frontier input |
| Grok 4 | Jul 2025 | 256K | Original multi-agent model |
| Grok 3 | Feb 2025 | retired | Retired May 15, 2026, now redirects to 4.3 |
Two things stand out. First, the flagship Grok 4.3 has a smaller context window (1M tokens) than the older Grok 4.20 and 4.1 Fast (2M each). If your work depends on feeding in very long documents, the newest model is not automatically the right one. Second, xAI retires versions quickly: Grok 3 was pulled in May 2026 and its traffic now redirects to 4.3. Anything you read citing Grok 3 benchmarks is describing a model you can no longer directly select.
Behind all of this is Grok 5, reported to be in training on the Colossus 2 supercomputer with a 10-trillion-parameter target and a projected arrival in the second or third quarter of 2026. xAI says it trains seven models at once. Treat any Grok 5 specification as a target, not a shipped fact.
Capabilities
Grok's feature set blends a multi-agent reasoning system, real-time data, and a growing set of multimodal and agentic tools. These are the capabilities that distinguish it from ChatGPT, Claude and Gemini in day-to-day use as of June 2026.
Multi-Agent Reasoning
For demanding queries, Grok deploys multiple specialized agents (named Grok, Harper, Benjamin and Lucas) that work in parallel and cross-check each other before producing a final answer. The "Heavy" configuration scales this to 16 agents. xAI reports that this peer-review approach cut its hallucination rate from roughly 12% to 4.2%. That figure is xAI's own, not independently audited, and it applies to the multi-agent consumer experience. One important caveat: this multi-agent architecture is not yet available through the API, where xAI lists it as "coming soon." API users get a single-model experience.
Real-Time X and DeepSearch
Grok reads the live X firehose, which gives it access to breaking events and social sentiment that index-based models cannot reach. DeepSearch builds on this by running iterative, cited research passes across the web and X for multi-source questions. This is the capability that most clearly has no equivalent among competitors.
Multimodal and Generative Tools
Grok 4.3 adds native video input, so you can hand it video content directly for analysis. On the generation side, the Aurora model produces images and Grok Imagine produces short video clips (roughly 10 seconds, 720p HD, with audio). On the document side, the agentic tooling can run Python, browse the web, and generate downloadable PDFs, spreadsheets and slide decks from a conversation.
Tiers and Pricing
Grok is sold three ways: free and paid consumer tiers, a team-oriented Business plan, and a usage-based API. The consumer tiers bundle Grok inside the X platform and the standalone Grok apps; the API is OpenAI-compatible, so migrating an existing integration usually means changing only the base URL and key.
- Basic Grok 4 / 4.1 access
- About 10 prompts every 2 hours
- Limited DeepSearch and image generation
- No credit card required
- Grok 4 / 4.1 with higher limits
- DeepSearch, Big Brain, faster routing
- Imagine image and 10s video generation
- Up to 2M token context
- Everything in SuperGrok
- Grok 4 Heavy 16-agent mode
- Maximum compute and usage caps
- Analysts call it overkill for most users
- SOC 2 controls
- Data not used for training by default
- Team management controls
- Centralized billing
X Premium ($8/mo) and X Premium+ ($40/mo) also bundle Grok access inside X. Note a pricing discrepancy: one source lists X Premium+ at $50/mo rather than $40. Because xAI changes pricing frequently, confirm the current figure on the official xAI pricing page before subscribing. Tiers verified June 9, 2026.
API Pricing (per 1M tokens)
The API is where Grok's cost story is most competitive. Grok 4.1 Fast is the cheapest input among western frontier models, while the flagship 4.3 sits in the middle and the original Grok 4 is the most expensive of the three.
| Model | Input | Output | Context |
|---|---|---|---|
| Grok 4.3 | $1.25 | $2.50 | 1M |
| Grok 4.1 Fast | $0.20 | $0.50 | 2M |
| Grok 4 | $3.00 | $15.00 | 256K |
Grok 4 cached input is $0.75/M. Prompt caching is automatic, the batch API offers 50% off async workloads, and live search costs $25 per 1,000 sources. Note that the multi-agent architecture is not available via API. API pricing verified June 9, 2026. For the full tier-by-tier breakdown, see our Grok pricing guide.
Benchmarks
Two warnings before any number. First, most public Grok benchmark scores describe Grok 4 or Grok 4.20, not the 4.3 flagship; independent 4.3-specific scores are still thin. Second, several headline figures come from xAI's own evaluations rather than third-party studies. We label each source below.
Read this first: Where a row says "independent," the score comes from an outside evaluator such as Artificial Analysis. Where it says "xAI," the figure is self-reported. Competitor scores in the source data reference Claude 4.7 and an earlier Gemini, so treat cross-model gaps as directional rather than exact.
Independent leaders: Grok 4.20 holds the AA Omniscience record at 78%. Claude 4.7 leads SWE-bench Verified at 87.6%. Gemini 3.1 leads GPQA Diamond at 94.3%. xAI-reported: Grok 4 Heavy 44 to 50% on HLE, 15.9% on ARC-AGI-2 (a record at its launch), and Grok 3 Think 93.3% on AIME 2025. Most figures predate Grok 4.3. Sources: Artificial Analysis (independent), xAI (self-reported). Verified June 2026.
The one place Grok genuinely leads is factual reliability. The 78% AA Omniscience score for Grok 4.20 is an independent record, and it measures the model's willingness to admit uncertainty rather than fabricate. For coding and peak reasoning, though, Grok trails: Claude leads SWE-bench Verified and the latest GPT and Gemini score higher on graduate-level science. If you want a current head-to-head, see Grok vs Gemini and Grok vs Claude.
Who Should Use It
Grok 4.3 is a strong fit for a specific set of users and a weak fit for others. The decision usually comes down to whether you need real-time social data, the cheapest possible API input, or maximum factual caution, versus peak coding and reasoning.
Native X access gives Grok live trending topics, public sentiment and breaking news that index-based models cannot reach. For real-time social work, it is effectively the only frontier option.
Best fit: SuperGrok / X Premium+Grok 4.1 Fast at $0.20/M input with a 2M context window is the cheapest western frontier input. The OpenAI-compatible API makes migration straightforward. Note the multi-agent system is not available over the API.
Best fit: API (Grok 4.1 Fast)If a confidently wrong answer is expensive in your workflow, Grok 4.20's record 78% Omniscience score (independent) and the multi-agent peer review make it worth testing against your own data.
Best fit: SuperGrok Heavy / BusinessGrok Business at $30/seat offers SOC 2 controls and no-training-on-your-data by default. But the documented content-safety and bias issues below mean a formal risk review should precede any deployment.
Best fit: Grok Business (with review)The Controversies
An honest "what is" page cannot stop at features and pricing. Grok carries a documented record of safety, content and bias problems that have drawn regulatory attention. We summarize them here factually and proportionately, with the reasons they matter for anyone choosing a tool.
From late 2025, deliberately thin image-generation guardrails (including a "Spicy" mode built on the Flux.1 model) were reported to enable the creation of nonconsensual sexualized deepfakes of real people, including minors. The reporting prompted broad criticism, an EU regulatory crackdown, and public calls to ban X in some jurisdictions. This is the most serious item on the list and the one most likely to carry legal and reputational risk.
Independent reviewers have found that Grok tends to mirror Elon Musk's views, leans right on contested political questions, and at times flatters Musk in unrelated responses. System prompts have been modified in ways that affected the model's political stances. For any use where neutrality matters, this is a material concern.
Grokipedia, xAI's AI-generated encyclopedia, has been reported to promote debunked conspiracy theories (including HIV/AIDS denial, vaccine-autism claims, climate denial and race-IQ pseudoscience) and to rely on low-credibility sources. Treat its entries as unverified rather than authoritative.
Despite its strong Omniscience score, Grok still fails some basic logic puzzles and has, on occasion, treated jokes or false stories circulating on X as breaking news. Real-time data is a strength, but it also means the model can amplify whatever is trending, true or not.
None of this erases Grok's genuine strengths, and xAI has reversed or patched several individual incidents after public pressure. But the pattern is the point: the same design choices that make Grok fast, blunt and real-time also make it more prone to these failures than its more cautious competitors. Buyers should weigh that trade-off deliberately rather than assume it away.
Limitations
Beyond the controversies, Grok 4.3 has practical limitations worth knowing before you commit. These are about capability and product gaps rather than conduct.
Claude leads SWE-bench Verified at 87.6% versus Grok 4's 75.0%, and the latest GPT and Gemini score higher on GPQA Diamond. Those competitor figures reference Claude 4.7 and an earlier Gemini (the current flagships are Claude Opus 4.8 and Gemini 3.1 Pro), so read the gaps as directional rather than exact. The direction is consistent, though: for peak coding and advanced reasoning, Grok is not the default choice despite its reliability edge.
The headline four-agent (and 16-agent Heavy) architecture is consumer-only; xAI lists the API version as "coming soon." If you build on the API today, you get a single-model experience, so the hallucination-reduction figures may not apply to your integration.
The full 16-agent Heavy experience sits behind the $300/month SuperGrok Heavy tier. Analysts have described it as overkill for most users, roughly ten times the price of SuperGrok without ten times the value for everyday work.
Grok 4.3 tops out at 1M tokens, while the older Grok 4.20 and 4.1 Fast reach 2M. If long-document context is your priority, the newest model is not the largest, and you may need to mix versions.
Go Deeper
Resources from across Tech Jacks Solutions