Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23


QWEN

Is Qwen Free? Pricing, Models & API Tiers Explained

Qwen pricing spans three tiers: free on your own hardware, a free hosted tier on Groq, and a paid API starting at $0.15 per million input tokens for the open-weight 35B model. This guide covers every Qwen pricing tier, every access path, and every licensing condition for the 2026 model lineup, with no hand-waving about "competitive pricing."

The Short Answer: Yes, Three Ways

Qwen's access model splits into three tiers: genuinely free (local open-weight models), platform-free (Groq's hosted free tier with rate limits), and paid API access starting at $0.15 per million input tokens for the open-weight 35B model up to $2.50 per million for the frontier-tier Qwen3.7-Max.

$0
Local hosting cost: Apache 2.0 open-weight models
$0.15
Per million input tokens: cheapest hosted API (Qwen3.6-35B-A3B)
90%
Discount on cached vs standard input tokens (Qwen3.7-Max)
6x
Cheaper than Claude Opus 4.8 at frontier tier ($2.50 vs $15/M)

Which Qwen pricing tier applies depends on your use case. A developer running Qwen3.6-35B-A3B on an RTX 4090 for local coding assistance pays nothing. A team using Qwen3.7-Max for an enterprise agent pipeline pays $2.50 per million input tokens, still well below comparable frontier models. Both paths use the same underlying model architecture; the cost difference comes from who is running the inference.


Free Access: Local, Groq, and What Happened to OAuth

Local Self-Hosting

Every Qwen model under 35 billion parameters carries an Apache 2.0 license. That means commercial use, fine-tuning, redistribution, and integration into proprietary products, with no royalty or usage fee. The Qwen3.5-397B-A17B model is the notable exception: at 397 billion parameters, it is also Apache 2.0, an unusually permissive choice for a model of this capability.

Practical hardware benchmarks from the 2026 model lineup: Qwen3.6-35B-A3B runs at 20-25 tokens per second on an RTX 4090 with 24GB of VRAM (INT4 quantization). The model activates only 3 billion parameters per token, so inference cost tracks the 3B active count, not the 35B total. For 16GB VRAM, Qwen3.5-9B is the mainstream local choice for coding and RAG workflows.

Groq Free Tier

Groq hosts Qwen3-32B on its free tier. No credit card required. The model covers strong coding and math tasks and supports Qwen's switchable thinking/non-thinking modes. Groq does not publish specific request-per-minute limits for free tier accounts, so treat this path as suitable for prototyping and personal projects rather than production pipelines.

The Qwen OAuth Era Is Over

From Qwen's public launch through early 2026, Alibaba ran a free OAuth tier through the Qwen API. At its peak, it offered 1,000 requests per day. Before the shutdown, Alibaba cut the limit to 100 requests per day. The tier closed entirely on April 15, 2026.

Do not plan around OAuth

The Qwen OAuth free tier was discontinued on April 15, 2026. Any integration built on that authentication path stopped working on that date. There is no announced replacement. If you need a free hosted option, use the Groq free tier or self-host an open-weight model.


Qwen Pricing: API Tiers From $0.15 to $2.50 Per Million Tokens

The paid Qwen API runs through Alibaba Cloud Model Studio (primary endpoint: Singapore region). Pricing follows a per-token model: one million tokens is roughly 750,000 words, or about 3,000 typical API requests at average message length. Three tiers cover most production use cases.

Model Input ($/M tokens) Output ($/M tokens) Context Open-Weight
Qwen3.7-Max $2.50 $7.50 1M tokens No (API only)
Qwen3.6-Max-Preview $1.30 $7.80 262K tokens No (API only)
Qwen3.6-35B-A3B $0.15 $1.00 262K native (1M via YaRN) Yes (Apache 2.0)
6x cheaper
Qwen3.7-Max ($2.50/M input) vs Claude Opus 4.8 ($15/M input). A workflow costing $300 in Opus 4.8 tokens costs approximately $50 in Qwen3.7-Max tokens.

Qwen3.7-Max is the flagship API model: approximately 1 trillion total parameters with roughly 24 billion active per forward pass (Mixture of Experts architecture), a 1 million token context window, and $2.50 per million input tokens. On SWE-Bench Pro (the autonomous software engineering benchmark), it scores 60.6%, which puts it in the top tier of current frontier models at one-sixth the per-token cost of comparable alternatives.

Qwen3.6-35B-A3B is the open-weight API model. At $0.15 per million input tokens, it delivers the cheapest tier in Qwen pricing for hosted inference; the model is the same one available for free local download, so the API version simply provides Alibaba's managed inference at scale. The model is multimodal, supporting image input alongside text. Cache reads drop to $0.05 per million tokens.


FREE TEMPLATE

AI Governance Charter

Establish your organization's AI principles in one document

Download Free →

Prompt Caching: The 90% Discount

Qwen3.7-Max prompt caching cuts your effective input cost by 90% on repeated context prefixes: $0.25/M on cache reads versus $2.50/M on fresh input. You pay once to write the cache at $3.125/M, then recoup that cost across subsequent calls that reuse the same prefix. The crossover breaks even after roughly 2-3 reuses, and the discount compounds as call volume grows.

Cache Action Price per Million Tokens Notes
Standard input (no cache) $2.50 Qwen3.7-Max baseline
Cache creation $3.125 25% above standard input (you pay to store)
Cache read $0.25 90% below standard input (you save on retrieval)
Qwen3.6-35B-A3B cache read $0.05 Separate pricing for the open-weight tier

Caching makes economic sense for: RAG pipelines where system prompts and retrieved context repeat across user turns; customer support agents with long instruction sets that stay constant across sessions; and document analysis workflows that process the same background material repeatedly.

The 5-minute TTL (time-to-live) is short

The Qwen3.7-Max cache has a 5-minute time-to-live. If more than 5 minutes pass between requests sharing the same prefix, the cache expires and you pay the cache creation fee again on the next request. Interactive chatbots with frequent turns are well-suited. Batch jobs with long gaps between calls are not.

Third-Party Providers

DeepInfra and OpenRouter are the two most practical alternatives to Alibaba Cloud for Qwen API access. Both are OpenAI-compatible, so the same SDK code works with a base URL swap. Rates differ by model and access type.

Provider Models Available Input / Output (per M) API Compatibility
DeepInfra Qwen3.5-397B-A17B $0.54 / $3.40 OpenAI-compatible
DeepInfra Qwen3-235B-A22B (thinking) $0.45 / $3.49 OpenAI-compatible
OpenRouter Qwen3.7-Max, Qwen3.6-35B-A3B + others Varies by model OpenAI-compatible
Together AI Qwen open-weight models Varies (serverless / dedicated) OpenAI-compatible

DeepInfra is the standout third-party option for large open-weight models. Its rate for Qwen3.5-397B-A17B ($0.54/$3.40 per million input/output tokens) undercuts Alibaba Cloud's listed rates for comparable capability tiers. Qwen3-235B-A22B with thinking enabled runs at $0.45/$3.49, useful for math and reasoning tasks where chain-of-thought is required and you want to avoid the Alibaba Cloud billing relationship.

OpenRouter provides a single OpenAI-compatible endpoint that routes to multiple backend providers. You can access qwen/qwen3.7-max or qwen/qwen3.6-35b-a3b without creating an Alibaba Cloud account. OpenRouter bills in credits, which some teams find simpler than managing regional cloud credentials.

Together AI hosts Qwen open-weight models. Rates vary by model and tier (serverless vs. dedicated instances). Dedicated instances carry a fixed hourly cost and make sense only when workload volume is high enough that per-token billing becomes expensive.

One practical note: third-party providers source their models from the open-weight releases. You will not find Qwen3.7-Max or Qwen3.6-Max-Preview on DeepInfra or Together AI: those are proprietary API-only models, available only through Alibaba Cloud. If you need the frontier-tier capability, Alibaba Cloud is the only option.

Enterprise Plans

Standard pay-per-token billing works for most teams. Two situations break that model: development teams that run Qwen agents continuously and need predictable costs, and organizations with data residency requirements that prohibit cloud API calls entirely. Alibaba Cloud has a separate track for each.

Alibaba Cloud Coding Plan

The Coding Plan is a fixed monthly subscription designed for development teams running Qwen continuously through coding agents, IDE integrations, or CI pipelines. Rather than accumulating per-token charges across many small requests, teams pay a predictable monthly fee that covers higher rate limits and priority routing. Alibaba has not published the subscription amount publicly; pricing is disclosed during the enterprise sales process and varies by seat count and commitment term.

The Coding Plan is worth evaluating if your team already uses Qwen Code (the terminal agent) or has integrated Qwen into an IDE extension like VS Code Continue or JetBrains. For intermittent use, pay-per-token billing remains cheaper. The crossover point depends on your average monthly token consumption.

Enterprise Deployment Kit

The Enterprise Deployment Kit provides Docker and Kubernetes deployment configurations for running Qwen models on private infrastructure. This targets industries with strict data residency requirements: finance, defense, healthcare, and high-security enterprise environments where cloud API calls are prohibited by policy or regulation.

With the kit, Qwen runs entirely within your network. No data leaves your perimeter, there are no per-token charges, and you control the hardware. The tradeoff is that you take on the operational cost of running the models yourself: GPU or accelerator provisioning, serving software, and ongoing maintenance. The kit includes configurations for vLLM and other production serving frameworks. It is distributed through Alibaba Cloud's enterprise sales channel, not as a public download.

Enterprise Pricing Is Not Listed
Coding Plan cost is undisclosed

The Alibaba Cloud Coding Plan subscription fee is not published on the pricing page. Expect a sales conversation before you see a number. If you need a budget estimate, request a quote through the Alibaba Cloud enterprise portal and ask for a consumption-based projection at your expected token volume.

Full Model Lineup (2026)

The Qwen3 model family spans five capability tiers from API-only frontier models to sub-1B edge models. Qwen pricing across these tiers ranges from $0 (open-weight, self-hosted) to $2.50/M input tokens for the frontier API. Understanding active vs. total parameter counts is critical here: Qwen's MoE (Mixture of Experts) models have large total parameter counts but activate only a fraction per request, which is why a "35B" model can run on a 24GB GPU.

Model Params Context Access License Input / Output (per M)
Qwen3.7-Max ~1T 1M tokens API only Proprietary $2.50 / $7.50
Qwen3.6-Max-Preview ~1T 262K tokens API only Proprietary $1.30 / $7.80
Qwen3.6-35B-A3B 35B (3B active) 262K tokens Open-weight + API Apache 2.0 $0.15 / $1.00
Qwen3.5-397B-A17B 397B (17B active) 262K tokens Open-weight Apache 2.0 $0.54 / $3.40 (DeepInfra)
Qwen3-235B-A22B 235B (22B active) 262K tokens Open-weight Tongyi Qianwen $0.45 / $3.49 (DeepInfra)
Qwen3-32B 32B dense 262K tokens Open-weight Apache 2.0 Free (Groq) / self-host
Qwen3.6-27B 27B dense 262K tokens Open-weight Apache 2.0 Self-host
Qwen3.5 small (0.8B–9B) 0.8B / 2B / 4B / 9B 262K tokens Open-weight Apache 2.0 Self-host (edge/consumer)

Three patterns to carry forward. The 1M token context window is exclusive to Qwen3.7-Max on the proprietary API; open-weight models top out at 262K tokens natively. The Tongyi Qianwen License applies to Qwen3-235B-A22B and some other larger models, so read the licensing section below before building a commercial product on any Tongyi-licensed model. The small series (0.8B to 9B) is genuinely competitive on edge hardware and benchmarks above similar-size models from other vendors.

Licensing: What You Can Build

Qwen's licensing varies by model tier, and the distinction has real consequences for commercial projects. There are three regimes you need to understand.

Apache 2.0: Full Commercial Freedom

Most Qwen3-generation models at 35B parameters and below ship under Apache 2.0. This includes Qwen3.6-35B-A3B, Qwen3-32B, Qwen3.6-27B, and the much larger Qwen3.5-397B-A17B, which Alibaba released under Apache 2.0 as an unusually permissive exception for a model of that capability. Apache 2.0 permits commercial use, fine-tuning, and redistribution without requiring you to publish your modifications or pay royalties. If you are building a product on an open-weight Qwen model, Apache 2.0 means you can ship it.

Tongyi Qianwen License: Restricted Commercial Use

Some larger models in the Qwen family (including Qwen3-235B-A22B) use the Tongyi Qianwen License. This license permits non-commercial use freely. Commercial use requires a separate agreement with Alibaba Cloud if the product reaches more than 100 million monthly active users. Below that threshold, commercial use is permitted without a separate agreement, but the license terms are more restrictive than Apache 2.0: you cannot relicense the model weights, and certain redistribution conditions apply. Read the full license text before building a commercial product on a Tongyi-licensed model.

Proprietary API: No Weight Access

Qwen3.7-Max and Qwen3.6-Max-Preview are API-only models. Alibaba does not release the weights. You interact through the API under Alibaba Cloud's standard terms of service. There is no licensing decision to make: you are renting compute, not owning a model. This is identical to how comparable proprietary frontier models operate.

Licensing Cautions
Verify the license before you build

The licensing pattern changed between Qwen2.5 and Qwen3.x. Do not assume all Qwen models share the same license. Check the Hugging Face model card for the specific model variant you are deploying. Qwen3-235B-A22B is Tongyi-licensed, not Apache 2.0, despite being an open-weight release.

100M MAU threshold for Tongyi models

Tongyi Qianwen License requires a separate Alibaba Cloud agreement for products exceeding 100 million monthly active users. This is not a concern at early product stage, but plan for it before you scale. The agreement process is not instantaneous.

Frequently Asked Questions

Yes, in two ways. Open-weight Qwen models run locally at zero cost under Apache 2.0, no API key, no usage limits. Groq hosts Qwen3-32B on its free tier with no credit card required, though rate limits apply. The Alibaba Cloud API starts at $0.15 per million input tokens for Qwen3.6-35B-A3B. The former OAuth free tier was discontinued April 15, 2026.

Qwen3.7-Max costs $2.50 per million input tokens, roughly 6x cheaper than Claude Opus 4.8 at $15/M. A workflow costing $300 in Opus 4.8 tokens costs approximately $50 in Qwen3.7-Max tokens, with comparable or better performance on coding benchmarks.

Yes. Open-weight Qwen models under Apache 2.0 permit commercial use, fine-tuning, and redistribution. Qwen3.6-35B-A3B runs at 20-25 tokens per second on an RTX 4090 with 24GB VRAM. Qwen3.5-397B-A17B fits in 4-bit quantization on a Mac Studio with 256GB of RAM.

Alibaba discontinued the Qwen OAuth free tier on April 15, 2026. The tier originally allowed 1,000 requests per day, later reduced to 100 requests per day before shutdown. The Groq free tier (Qwen3-32B) remains active and does not require a credit card.

Facts verified against Alibaba Cloud Model Studio documentation (May 2026). Pricing reflects published rates as of the date shown above. Verify current pricing at dashscope.aliyun.com before making purchasing decisions.
Qwen and Alibaba Cloud are trademarks of Alibaba Group Holding Limited. Tech Jacks Solutions is an independent publisher and is not affiliated with, endorsed by, or sponsored by Alibaba Group. All product names, logos, and brands are property of their respective owners.
Before You Use AI
Your Privacy

Qwen API requests are processed on Alibaba Cloud servers (primary region: Singapore). Free-tier and standard API calls may be used to improve models. Enterprise Deployment Kit users process data on private infrastructure, no data leaves your environment. Review Alibaba Cloud's data processing terms before sending sensitive information through any hosted Qwen endpoint.

Mental Health & AI Dependency

AI coding and productivity tools can create dependency patterns. Qwen's autonomous agent capabilities require human oversight, particularly for agentic workflows that run multiple sequential tool calls without interruption. Set boundaries on autonomous execution and review all AI-generated output before deployment. If you are experiencing distress:

  • 988 Suicide & Crisis Lifeline: Call or text 988
  • SAMHSA Helpline: 1-800-662-4357
  • Crisis Text Line: Text HOME to 741741

AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.

Your Rights & Our Transparency

This article is editorially independent. Tech Jacks Solutions does not have a commercial relationship with Alibaba Cloud or any Qwen provider. Pricing data is sourced from official Alibaba Cloud documentation and verified third-party providers as of May 2026. Rates change. Verify current pricing before committing to a production architecture.

Under GDPR and CCPA, you have the right to request, correct, or delete personal data we hold. This content references the EU AI Act's risk classification framework for transparency. Article links may include affiliate relationships with hosting providers.