DeepSeek Pricing & API Costs: Complete Guide (2026)
DeepSeek is the cheapest serious AI lab in the world (as of May 2026). The flagship DeepSeek V4-Flash API costs $0.14 per million input tokens and $0.28 per million output tokens as of May 2026 -- roughly 35 to 100 times cheaper than GPT-5.5 or Claude Opus 4.7 at equivalent context lengths.
This guide breaks down every layer of DeepSeek pricing: the free consumer tier, the pay-per-token API rates for each model, what self-hosting actually costs in hardware, and how DeepSeek stacks up against OpenAI, Anthropic, and Google on price. For background on what DeepSeek is and how it works, start with our What Is DeepSeek? breakdown.
The Free Tier: What You Get for $0
DeepSeek's consumer chat is completely free. There is no Plus plan, no Pro subscription, and no paywall. You get unlimited access to chat.deepseek.com and the official mobile app at zero cost. DeepSeek API Docs, May 2026
- Full model access -- DeepSeek V4-Flash, V4-Pro, and legacy R1 models through the web interface
- Web search -- built-in search capability within the chat UI
- File uploads -- no restrictions on document uploads or long conversations
- Chat history -- saved automatically across sessions
The only catch is fair-use throttling. During peak hours you may see "Server Busy" warnings that temporarily limit your access. There are no per-day limits, no message caps, and no feature gates behind a paywall.
API Pricing by Model
DeepSeek's API uses a pay-per-token model. You pay for input tokens (what you send) and output tokens (what the model generates). One million tokens is roughly 750,000 English words. All prices below are per million tokens in USD. DeepSeek API Docs, May 2026
V4-Flash: Full Rate Card
| Metric | Price per 1M Tokens |
|---|---|
| Input (cache miss) | $0.14 |
| Input (cache hit) | $0.0028 |
| Output | $0.28 |
| Context window | 1M tokens |
| Max output | 384K tokens |
The legacy deepseek-chat endpoint now routes to V4-Flash non-thinking mode, and deepseek-reasoner routes to V4-Flash thinking mode. Both legacy names will be fully retired on July 24, 2026. DeepSeek API Change Log, Apr 24, 2026
V4-Pro: Full Rate Card
| Metric | Standard Price | Promo (until May 31, 2026) |
|---|---|---|
| Input (cache miss) | $1.74 | $0.435 (75% off) |
| Input (cache hit) | $0.0145 | $0.003625 (75% off) |
| Output | $3.48 | $0.87 (75% off) |
Context Caching: The Hidden Cost Saver
Every DeepSeek API request automatically benefits from context caching. When your prompts share the same prefix -- for example, a system prompt you reuse across calls -- the API recognizes the overlap and charges the cache-hit rate instead of the full rate. On V4-Flash, that drops input costs from $0.14 to $0.0028 per million tokens: a 98% reduction. DeepSeek API Docs, May 2026
This matters most for production workflows that send the same system prompt or document context repeatedly. If you are building a retrieval-augmented generation pipeline that processes hundreds of documents against a fixed instruction set, caching can cut your input bill by an order of magnitude.
Legacy Models (Reference Only)
| Model | Input/M | Cache Hit/M | Output/M | Context |
|---|---|---|---|---|
| DeepSeek-R1 (Jan 2025) | $0.55 | $0.14 | $2.19 | 64-128K |
| DeepSeek-V3 (Dec 2024) | $0.27 | $0.07 | $1.10 | 64K |
Both legacy endpoints now route to V4-Flash automatically. These rates are historical reference only.
Self-Hosting Costs: Open Weights, Expensive Hardware
DeepSeek releases all its models under the MIT license. You can download the weights from Hugging Face, run them on your own infrastructure, and use them for commercial purposes with no royalties or restrictions. The license even explicitly permits using DeepSeek outputs to train competing large language models. DeepSeek GitHub, MIT License
The weights are free. The hardware is not.
V4-Flash (284B parameters)
- Download size: 160GB
- VRAM required (quantized): 140 to 158GB depending on INT4 or FP8 precision
- Minimum hardware: 2x NVIDIA H100 (80GB each), 2x A100, or 4x RTX 4090
- Estimated monthly cloud cost: $3,000 to $6,000 for reserved GPU instances
V4-Pro (1.6T parameters)
- Download size: 865GB
- VRAM required: 862GB to 2.4TB depending on precision (FP8 vs full)
- Minimum hardware: 8 to 16x NVIDIA H100 (multi-node cluster)
- Estimated monthly cloud cost: $15,000 to $40,000+ for GPU cluster rental
How DeepSeek Pricing Compares to Competitors
The table below shows current API rates for frontier models across the four major providers. All figures are per million tokens in USD and reflect published rates as of May 2026.
| Model | Input/M | Output/M | vs V4-Flash |
|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 1x (baseline) |
| DeepSeek V4-Pro (promo) | $0.435 | $0.87 | ~3x |
| GPT-5.4 Nano | $0.20 | $1.25 | 1.4-4.5x |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 | 1.8-5.4x |
| Claude Haiku 4.5 | $1.00 | $5.00 | 7-18x |
| Gemini 3.1 Pro | $2.00 | $12.00 | 14-43x |
| GPT-5.4 | $2.50 | $15.00 | 18-54x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 21-54x |
| Claude Opus 4.7 | $5.00 | $25.00 | 35-90x |
| GPT-5.5 | $5.00 | $30.00 | 35-107x |
Sources: DeepSeek API Docs, OpenAI Pricing, Anthropic Pricing, Google AI Pricing -- all accessed May 2026
- V4-Flash is cheaper than even the smallest budget models from competitors (GPT-5.4 Nano at $0.20 input, Gemini Flash-Lite at $0.25 input)
- The gap widens on output tokens -- V4-Flash output at $0.28/M is 107x cheaper than GPT-5.5 output at $30/M
- Even V4-Pro at full price ($1.74/$3.48) undercuts every flagship competitor
- No separate "nano" model needed -- V4-Flash already occupies the budget price point while delivering frontier-class performance
Who Should Use Which Tier
Limitations and Honest Caveats
Frequently Asked Questions
DeepSeek's API routes all requests through servers in China. For the free consumer chat, conversations are processed on DeepSeek's infrastructure under Chinese data jurisdiction. The API does not currently publish a clear data retention or training exclusion policy comparable to Western labs. Enterprise and free-tier data handling may differ. Review DeepSeek's privacy policy and your organization's data residency requirements before transmitting sensitive information. Self-hosting under the MIT license avoids third-party data transfer entirely.
AI tools that automate writing, research, and decision-making can quietly replace human critical thinking. Maintain deliberate review for consequential outputs -- financial analysis, medical information, legal documents. If you or someone you know is experiencing a mental health crisis:
- 988 Suicide & Crisis Lifeline -- Call or text 988 (US)
- SAMHSA Helpline -- 1-800-662-4357
- Crisis Text Line -- Text HOME to 741741
Under GDPR and CCPA, you have the right to access, correct, and delete your personal data held by any AI provider. Tech Jacks Solutions maintains editorial independence. This article was not sponsored, reviewed, or approved by DeepSeek, Hangzhou DeepSeek Artificial Intelligence Co., Ltd., or any competitor mentioned. We receive no affiliate commissions from DeepSeek API usage or any linked provider. Our evaluations are based on primary documentation and verified data.