DeepSeek Pricing 2026: $0.14/M Tokens — API Costs, Free Tier & V4 Pro
DeepSeek is the cheapest serious AI lab in the world (as of May 2026). The flagship DeepSeek V4-Flash API costs $0.14 per million input tokens and $0.28 per million output tokens as of May 2026 -- roughly 35 to 100 times cheaper than GPT-5.5 or Claude Opus 4.8 at equivalent context lengths.
This guide breaks down every layer of DeepSeek pricing: the free consumer tier, the pay-per-token API rates for each model, what self-hosting actually costs in hardware, and how DeepSeek stacks up against OpenAI, Anthropic, and Google on price. For background on what DeepSeek is and how it works, start with our What Is DeepSeek? breakdown.
The Free Tier: What You Get for $0
DeepSeek's consumer chat is completely free. There is no Plus plan, no Pro subscription, and no paywall. You get unlimited access to chat.deepseek.com and the official mobile app at zero cost. DeepSeek API Docs, May 2026
- Full model access -- DeepSeek V4-Flash, V4-Pro, and legacy R1 models through the web interface
- Web search -- built-in search capability within the chat UI
- File uploads -- no restrictions on document uploads or long conversations
- Chat history -- saved automatically across sessions
The only catch is fair-use throttling. During peak hours you may see "Server Busy" warnings that temporarily limit your access. There are no per-day limits, no message caps, and no feature gates behind a paywall.
API Pricing by Model
DeepSeek's API uses a pay-per-token model. You pay for input tokens (what you send) and output tokens (what the model generates). One million tokens is roughly 750,000 English words. All prices below are per million tokens in USD. DeepSeek API Docs, May 2026
V4-Flash: Full Rate Card
| Metric | Price per 1M Tokens |
|---|---|
| Input (cache miss) | $0.14 |
| Input (cache hit) | $0.0028 |
| Output | $0.28 |
| Context window | 1M tokens |
| Max output | 384K tokens |
The legacy deepseek-chat endpoint now routes to V4-Flash non-thinking mode, and deepseek-reasoner routes to V4-Flash thinking mode. Both legacy names will be fully retired on July 24, 2026. DeepSeek API Change Log, Apr 24, 2026
V4-Pro: Full Rate Card
| Metric | Standard Price | Promo (until May 31, 2026) |
|---|---|---|
| Input (cache miss) | $1.74 | $0.435 (75% off) |
| Input (cache hit) | $0.0145 | $0.003625 (75% off) |
| Output | $3.48 | $0.87 (75% off) |
Context Caching: The Hidden Cost Saver
Every DeepSeek API request automatically benefits from context caching. When your prompts share the same prefix -- for example, a system prompt you reuse across calls -- the API recognizes the overlap and charges the cache-hit rate instead of the full rate. On V4-Flash, that drops input costs from $0.14 to $0.0028 per million tokens: a 98% reduction. DeepSeek API Docs, May 2026
This matters most for production workflows that send the same system prompt or document context repeatedly. If you are building a retrieval-augmented generation pipeline that processes hundreds of documents against a fixed instruction set, caching can cut your input bill by an order of magnitude.
More Ways to Cut Costs
- Off-peak discounts: DeepSeek historically offered 50 to 75% off during off-peak hours (16:30 to 00:30 UTC) for V3 and R1 models. V4 off-peak pricing has not been officially confirmed yet, but scheduling batch workloads for this window is worth testing. DeepSeek API Docs, May 2026
- Tiered reasoning modes: V4-Flash supports Non-Think mode for routine tasks (fastest, cheapest) and Think High / Think Max for complex reasoning. Only pay for deep reasoning when you need it.
- 1M context window at no extra charge: Both V4-Flash and V4-Pro support a 1 million token context window with no surcharge. Competitors charge premiums for extended context.
- Pin system prompts for cache hits: Keep your system prompt identical across calls. The cache-hit rate on V4-Flash is $0.0028 per million tokens -- 98% less than the $0.14 cache-miss rate. Even small differences in the system prompt prefix break the cache.
Legacy Models (Reference Only)
| Model | Input/M | Cache Hit/M | Output/M | Context |
|---|---|---|---|---|
| DeepSeek-R1 (Jan 2025) | $0.55 | $0.14 | $2.19 | 64-128K |
| DeepSeek-V3 (Dec 2024) | $0.27 | $0.07 | $1.10 | 64K |
Both legacy endpoints now route to V4-Flash automatically. These rates are historical reference only.
Self-Hosting Costs: Open Weights, Expensive Hardware
DeepSeek releases all its models under the MIT license. You can download the weights from Hugging Face, run them on your own infrastructure, and use them for commercial purposes with no royalties or restrictions. The license even explicitly permits using DeepSeek outputs to train competing large language models. DeepSeek GitHub, MIT License
The weights are free. The hardware is not.
V4-Flash (284B parameters)
- Download size: 160GB
- VRAM required (quantized): 140 to 158GB depending on INT4 or FP8 precision
- Minimum hardware: 2x NVIDIA H100 (80GB each), 2x A100, or 4x RTX 4090
- Estimated monthly cloud cost: $3,000 to $6,000 for reserved GPU instances
V4-Pro (1.6T parameters)
- Download size: 865GB
- VRAM required: 862GB to 2.4TB depending on precision (FP8 vs full)
- Minimum hardware: 8 to 16x NVIDIA H100 (multi-node cluster)
- Estimated monthly cloud cost: $15,000 to $40,000+ for GPU cluster rental
AI Governance Charter
Establish your organization's AI principles in one document
Download Free →How DeepSeek Pricing Compares to Competitors
The table below shows current API rates for frontier models across the four major providers. All figures are per million tokens in USD and reflect published rates as of May 2026.
| Model | Input/M | Output/M | vs V4-Flash |
|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 1x (baseline) |
| DeepSeek V4-Pro (promo) | $0.435 | $0.87 | ~3x |
| GPT-5.4 Nano | $0.20 | $1.25 | 1.4-4.5x |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1.8-5.4x |
| Claude Haiku 4.5 | $1.00 | $5.00 | 7-18x |
| Gemini 3.1 Pro | $2.00 | $12.00 | 14-43x |
| GPT-5.4 | $2.50 | $15.00 | 18-54x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 21-54x |
| Claude Opus 4.8 | $5.00 | $25.00 | 35-90x |
| GPT-5.5 | $5.00 | $30.00 | 35-107x |
Sources: DeepSeek API Docs, OpenAI Pricing, Anthropic Pricing, Google AI Pricing -- all accessed May 2026
- V4-Flash is cheaper than even the smallest budget models from competitors (GPT-5.4 Nano at $0.20 input, Gemini Flash-Lite at $0.25 input)
- The gap widens on output tokens -- V4-Flash output at $0.28/M is 107x cheaper than GPT-5.5 output at $30/M
- Even V4-Pro at full price ($1.74/$3.48) undercuts every flagship competitor
- No separate "nano" model needed -- V4-Flash already occupies the budget price point while delivering frontier-class performance
Third-Party Provider Pricing
You do not have to use DeepSeek's own API. Several third-party inference providers host DeepSeek models, sometimes with added benefits like geographic routing or unified billing across multiple model families.
| Provider | Model | Input/M | Output/M | Notes |
|---|---|---|---|---|
| OpenRouter | V4-Flash | $0.14 | $0.28 | Matches direct pricing |
| OpenRouter | V4-Pro (promo) | $0.435 | $0.87 | Matches direct promo pricing |
| OpenRouter | V3.2 | $0.252 | $0.378 | Legacy model access |
| OpenRouter | R1 | $0.70 | $2.50 | Free tier for distilled models |
| Together AI / Fireworks | V3/V4 range | $0.30-$0.50 | $0.50-$0.90 | Varies by model and tier |
| Together AI / Fireworks | R1 | ~$7-$8 | ~$7-$8 | Significantly higher than direct |
| AWS Bedrock | V3.2 | $0.62 | $1.85 | US/EU data residency |
| Azure AI Foundry | Varies | Varies by region/SKU | Enterprise billing integration | |
Sources: OpenRouter model index, Together AI pricing, AWS Bedrock pricing -- accessed May 2026
OpenRouter matches DeepSeek's direct rates for V4 models and adds a free tier for distilled variants. AWS Bedrock and Azure AI Foundry charge premiums but solve data residency concerns by routing through US and EU infrastructure -- useful for teams that cannot send data to China. Together AI and Fireworks offer competitive rates for V3/V4 Flash but charge significantly more for R1 reasoning models.
Who Should Use Which Tier
Limitations and Honest Caveats
Frequently Asked Questions
Go Deeper
Resources from across Tech Jacks Solutions
FREEAI Governance Charter
Establish your organization's AI principles in one document
AI Career Paths
Explore roles that work with these tools daily
EU AI Act Guide
Check your compliance obligations under the EU AI Act
FREEAI Risk Management Template
Identify, assess, and mitigate AI deployment risks
FREEAI Bias Assessment
Evaluate bias risks before deploying any AI system