Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23


Alibaba Qwen

What Is Qwen AI? Alibaba's Open-Weight Model Explained

When Alibaba published its first open-weight language model in August 2023, the AI community took note. When the Qwen AI family crossed 90,000 derivative models on ModelScope, it became impossible to ignore. Qwen AI is Alibaba Cloud's family of open-weight and proprietary language models, engineered to cover the full spectrum from edge inference on consumer hardware to frontier-tier agentic coding in the cloud. The flagship Qwen3.7-Max places in the global top 5 on the Artificial Analysis Intelligence Index. Most models under 35 billion parameters ship under Apache 2.0, which means you can fine-tune and redistribute them commercially without royalties.

What is Qwen AI in one sentence? Qwen is Alibaba Cloud's AI model family, ranging from free open-weight models you can run on a laptop to a frontier-grade API that outscores Claude Opus on coding benchmarks at a fraction of the cost.


Who Built Qwen? (Alibaba Cloud)

Qwen is an Alibaba Cloud product, built by the Tongyi Large Model Business Unit. The division's chief AI architect is Zhou Jingren, who oversees model research and deployment strategy. The name "Qwen" is the commercial brand; the internal designation is "Tongyi Qianwen," which translates roughly to "understanding all questions."

In March 2026, Alibaba consolidated its AI research under a new entity called Alibaba Token Hub, which brought together model training, inference infrastructure, and developer tooling under one organizational umbrella. Alibaba CEO Eddie Wu has been the executive sponsor of the AI push since the company restructured its operations.

The Qwen project started quietly. A beta version shipped in April 2023, limited to enterprise partners on the Alibaba Cloud platform. By August 2023, Alibaba released Qwen-7B as an open-weight model. It was one of the first billion-plus-parameter models from a Chinese lab to ship downloadable weights under a permissive license, and that decision set the tone for everything that followed.

Understanding the naming convention helps: the "Qwen" prefix is the brand, the number is the generation (Qwen3.7 is the seventh sub-release of the third generation), and for MoE models the suffix format is [total params]B-A[active params]B. So Qwen3.6-35B-A3B has 35 billion total parameters but only 3 billion active per token. Who makes Qwen models at this scale is a team effort: Alibaba reports hundreds of researchers across its Singapore and mainland China research hubs.

Who owns Qwen AI ultimately is Alibaba Group, the Chinese technology company listed on the New York and Hong Kong stock exchanges. The AI model division operates as part of Alibaba Cloud, the group's cloud computing arm.


The Qwen AI Model Family Explained

The Qwen model family spans eight generations as of May 2026, ranging from 0.5-billion-parameter edge models to the approximately 1 trillion parameter Qwen3.7-Max. Each generation introduced architectural advances that separated Qwen from a derivative of Western model research into a distinct technical approach.

The current flagship is Qwen3.7-Max, a proprietary model available through the Alibaba Cloud API. It uses a Hybrid Gated DeltaNet architecture, a design that mixes linear attention layers and standard full attention in a 3:1 ratio. This is not standard Transformer architecture. Linear attention scales more efficiently for long sequences, which is part of why Qwen3.7-Max handles a 1 million token context window (roughly 750,000 words, or a large codebase) without the prohibitive memory costs that affect dense transformer models at the same scale.

Open-weight models currently top out at Qwen3.5-397B-A17B, which has 397 billion total parameters but only 17 billion active per token (this is Mixture of Experts: the model activates a different subset of parameters for each token, keeping runtime cost proportional to the 17B active count, not the 397B total). Independent evaluations place it in the GPT-5.2 tier. It runs in 4-bit quantization on a Mac Studio with 256GB of RAM.

For developers who need something smaller, Qwen3.6-35B-A3B is the current local-deployment sweet spot. At 35 billion total / 3 billion active parameters, it reaches 20-25 tokens per second on a single RTX 4090. It ships under Apache 2.0 and handles text, images, and video natively.

~1T
Parameters
Qwen3.7-Max
1M
Token Context
Window
90K+
Derivative Models
on ModelScope
201
Languages
Supported

The Qwen3 generation introduced the Hybrid Gated DeltaNet attention mechanism across the entire family. Qwen models use a 3:1 ratio of linear attention layers to full attention layers. Linear attention computes in linear time relative to sequence length (versus quadratic time for standard attention), which enables the long context windows that make Qwen models practical for large codebases and document analysis. The tradeoff is that linear attention can lose recall on certain retrieval tasks, which is why the full attention layers are retained at the 1-in-4 position.


How Qwen AI Benchmarks Against GPT and Claude

Qwen3.7-Max leads two public benchmarks as of May 2026. These are not self-reported scores. They come from third-party leaderboards that accept model submissions and verify results against held-out test sets.

On SWE-Bench Pro (the harder version of the standard SWE-Bench that evaluates models against 2,294 real GitHub issues from production open-source projects), Qwen3-235B-A22B scores 60.6%. A score of 60.6% means the model autonomously resolves roughly 6 in 10 real bug reports without human help. Claude Opus 4.8 scores 57.3% on the same benchmark. That 3.3-point gap on a benchmark designed to simulate real software engineering work is meaningful.

On Terminal-Bench 2.0, which simulates a software engineer working in a sandboxed terminal environment with a 5-hour timeout and access to shell commands, file I/O, and internet access, Qwen3-235B-A22B scores 69.7%, the top result on the public leaderboard at time of writing.

On math reasoning, Qwen3-235B-A22B scores 90.2% on MATH-500 and holds a CodeForces ELO of 2,056. On the Artificial Analysis Intelligence Index v4.0, Qwen3.7-Max scores 56.6, placing it in the global top 5. See our Qwen vs DeepSeek comparison for a full head-to-head on benchmarks and pricing.

SWE-Bench Pro
Coding: 2,294 real GitHub issues · Source: swebench.com
Qwen3-235B-A22B
60.6%
Claude Opus 4.8
57.3%
DeepSeek-R1
~49%
Qwen leads on production coding tasks. The 3.3-point gap over Claude Opus 4.8 is statistically meaningful on a 2,294-issue benchmark.
Terminal-Bench 2.0
Autonomous terminal agent · 5-hour sandboxed sessions · Source: terminal-bench.com
Qwen3-235B-A22B
69.7%
Kimi K2.6
66.7%
Claude Opus 4.8
65.4%
Qwen3-235B-A22B holds the #1 public leaderboard position on Terminal-Bench 2.0 as of May 2026.
Scores as of May 2026. Source: swebench.com · Artificial Analysis. Rankings may shift as new models submit results.

FREE TEMPLATE

AI Risk Management Template

Identify, assess, and mitigate AI deployment risks

Download Free →

Open Source or Proprietary? Qwen Licensing Explained

Qwen's licensing is tiered by model size. Models at or below 35 billion parameters ship under Apache 2.0, the most permissive open-source license available. That means you can download the weights, fine-tune on proprietary data, build products, and distribute commercially without paying Alibaba anything or negotiating a separate agreement.

Models at 72 billion parameters and above use the Tongyi Qianwen License, which places additional restrictions on commercial deployment and redistribution. If you are building a SaaS product on a 72B+ model, or distributing a fine-tune to downstream customers, the license text governs. Alibaba publishes the full terms on each model's Hugging Face card.

Apache 2.0
for all Qwen models at or under 35B parameters, including the high-capability Qwen3.6-35B-A3B and the open-weight flagship Qwen3.5-397B-A17B

The practical split: Qwen3.6-35B-A3B (35B total, 3B active MoE) and Qwen3.5-397B-A17B are both Apache 2.0, unusually permissive for models operating at near-frontier capability. The API-only Qwen3.7-Max is a proprietary hosted model; its weights are not available for download.

One nuance: "open-weight" is not the same as "open source." Qwen AI releases the weights, but training pipelines, full architecture specifications, and training data are not always disclosed. Apache 2.0 gives you freedom over the weights themselves; it does not provide a reproducible training pipeline.


The Qwen Ecosystem

One indicator that a model family has reached mainstream adoption: the derivative model count on ModelScope. Qwen has crossed 90,000, including fine-tunes, quantizations, domain-adapted variants, and multilingual models built by the community on top of the base weights. That puts it alongside Llama in open-weight ecosystem depth.

Language support is another dimension where Qwen separates from most Western-developed models. Official support extends to 201 languages, reflecting Alibaba's emphasis on markets where English-first models leave significant gaps. Chinese, Arabic, Korean, and Southeast Asian languages receive dedicated tuning alongside major European languages.

Qwen Generation Timeline
April 2023
Qwen Beta
Internal beta launch. First public glimpse of Alibaba's large model capabilities under the Tongyi Qianwen brand.
August 2023
Qwen-7B Open Weight
First public open-weight release under Apache 2.0. Triggered the derivative model ecosystem that now exceeds 90,000 variants.
June 2024
Qwen2
Major architectural revision. Expanded language coverage, improved instruction following. Entered global top-tier benchmarks for the first time.
September 2024
Qwen2.5
Coding, math, and long-context improvements. Qwen2.5-Coder emerged as a leading open-weight coding model in its class.
May 2025
Qwen3
Hybrid Gated DeltaNet architecture introduced. Switchable thinking/non-thinking modes. Qwen3-235B-A22B claimed global top-5 placement across multiple benchmarks.
May 2026
Qwen3.7-Max Stable
Approximately 1 trillion parameters. 1M token context window. $2.50/M input tokens via API. MCP native. Anthropic API protocol compatible.

On the protocol side, Qwen supports MCP (Model Context Protocol) natively: tool-calling integrations built for Claude or other MCP-compatible systems work without modification. Qwen also implements the Anthropic API protocol, so swapping Claude for Qwen3.7-Max in a compatible coding agent requires changing three environment variables and nothing else.


Qwen Code: The Terminal Coding Agent

Qwen Code is Alibaba's terminal-native coding agent, positioned as a direct alternative to Claude Code. It operates as a CLI tool, accepts natural language task descriptions, and runs multi-step coding sessions autonomously, reading files, writing tests, running shell commands, and iterating on failures without human input between steps.

35 hours
autonomous coding session with 1,158 tool calls executed without human intervention (vendor-reported record as of May 2026)

The benchmarks reflect the capability. On Terminal-Bench 2.0, Qwen3-235B-A22B scores 69.7%, the top result globally as of this writing. On SWE-Bench Pro (which evaluates agents against real GitHub issues), the same model reaches 60.6%, compared to Claude Opus 4.8 at 57.3%.

Because Qwen implements the Anthropic API protocol natively, MCP server configurations and tool-calling patterns that work in Claude Code work with Qwen Code. Developers running Claude-based pipelines can evaluate Qwen as a drop-in at roughly one-sixth the API cost of Claude Opus.


Who Should Use Qwen?

Qwen's range, from edge-deployable 0.8B models to a frontier trillion-parameter API, means different users get genuinely different things from it. The right entry point depends on your infrastructure, budget, and latency requirements.

Enterprise Agent Builders
You need top-tier SWE-Bench performance and are currently paying frontier-model prices. Qwen3.7-Max at $2.50/M input tokens delivers Claude Opus-class benchmark results at roughly one-sixth the cost. Workflows that cost $300 in Claude Opus tokens cost approximately $50 in Qwen3.7-Max tokens.
Best fit: Qwen3.7-Max API
Self-Hosting Teams
You want frontier-class capability without routing data to a cloud API. Qwen3.5-397B-A17B (Apache 2.0) runs in 4-bit quantization on a 256GB Mac Studio or multi-GPU server. No per-token cost after hardware. Evaluated at GPT-5.2 class on independent benchmarks.
Best fit: Qwen3.5-397B-A17B (local)
Budget-Conscious Developers
You want capable API access at the lowest possible per-token cost. Qwen3.6-35B-A3B at $0.15/M input is among the cheapest capable models available from any provider, with multimodal support and a 262K native context window. Apache 2.0 for all commercial uses.
Best fit: Qwen3.6-35B-A3B API
Edge & Mobile Builders
You are targeting hardware with 1-8GB of RAM: smartphones, embedded devices, or IoT endpoints. The Qwen3.5 small series (0.8B through 4B) runs in 4-bit quantization at 1-4.5GB and delivers capable code completion and summarization at these hardware constraints.
Best fit: Qwen3.5 0.8B to 4B series

Limitations to Know Before You Commit

Qwen's pricing and open-weight availability make a strong case. Three structural limitations apply regardless of which model or tier you choose.

API Traffic Routes Through Singapore
The primary Alibaba Cloud Model Studio endpoint routes through Singapore. Organizations with EU data residency requirements (GDPR) or US government data handling rules should evaluate this before choosing the hosted API. The Enterprise Deployment Kit (Docker and Kubernetes configurations) is the on-premise alternative for high-compliance environments.
Proprietary License on 72B+ Models
The Tongyi Qianwen License applies to models at 72B parameters and above. It is more restrictive than Apache 2.0 on commercial redistribution and may limit certain derivative applications. Read the full license text on Hugging Face before building a product on any 72B+ model. Models at or under 35B are Apache 2.0 without restriction.
Benchmark Scores Are Vendor-Reported
The 35-hour autonomous coding session record is vendor-reported and has not been independently replicated. SWE-Bench Pro scores are submitted to and verified by the leaderboard maintainers, but real-world performance in your specific domain may differ from benchmark conditions. Treat all performance figures as directional, not guaranteed.

Frequently Asked Questions

Is Qwen free to use?
It depends on how you access it. Open-weight models (Qwen3.6-35B-A3B, Qwen3.5-397B-A17B, and others) can be downloaded and run locally for free with no usage limits under Apache 2.0. The hosted API at Alibaba Cloud charges per token: Qwen3.7-Max starts at $2.50 per million input tokens. Qwen3-32B is also available on Groq's free tier, rate-limited, with no credit card required.
What is the difference between Qwen and ChatGPT?
Qwen is built by Alibaba Cloud; ChatGPT is built by OpenAI. The most significant practical difference is that Qwen's largest models are available as open-weight downloads under Apache 2.0, while ChatGPT's models are API-only with no downloadable weights. Qwen's API pricing is substantially lower than comparable frontier models: Qwen3.7-Max at $2.50/M input tokens. Both support tool calling and multi-turn conversations, but Qwen also provides native MCP support and the Anthropic API protocol.
Can I use Qwen commercially?
Yes, with conditions that depend on model size. Models at or under 35B parameters are Apache 2.0: commercial use, fine-tuning, and redistribution are all permitted. Models at 72B parameters and above use the Tongyi Qianwen License, which places additional restrictions on commercial deployment and redistribution. Read the license on the specific model's Hugging Face card before building a product on it.
What is Hybrid Gated DeltaNet?
Hybrid Gated DeltaNet is the attention architecture powering Qwen3.7-Max. It combines linear attention (efficient for long sequences) and full attention (accurate for local relationships) in a 3:1 ratio: three linear layers for every one full attention layer. This allows the model to process 1 million token context windows without the quadratic memory cost that standard transformer attention would require at that scale.
Grounded against Alibaba Cloud API documentation, ModelScope, Hugging Face model cards, and SWE-Bench Pro leaderboard. Pricing and specifications verified May 2026.

Qwen, Tongyi Qianwen, Qwen Code, and Alibaba Cloud are trademarks of Alibaba Group Holding Limited. All product names, logos, and brand identifiers are the property of their respective owners. Tech Jacks Solutions has no commercial relationship with Alibaba Cloud. This article is editorially independent.

Before You Use AI
Your Privacy

Qwen API access routes through Alibaba Cloud Model Studio in the Singapore region. Data sent to the Qwen API is processed on Alibaba's infrastructure. The Qwen OAuth free tier was discontinued on April 15, 2026. API key or Coding Plan subscription required for cloud access.

For data residency requirements in the EU or US, Alibaba offers an Enterprise Deployment Kit with Docker containers and Kubernetes manifests for running Qwen locally within your own private cloud.

Mental Health & AI Use

Qwen AI models are tools for productivity and research. They are not substitutes for professional mental health support. If you are experiencing distress, please reach out to trained professionals:

  • 988 Suicide & Crisis Lifeline: Call or text 988
  • SAMHSA Helpline: 1-800-662-4357
  • Crisis Text Line: Text HOME to 741741

AI systems can produce plausible-sounding but incorrect guidance. For mental health, medical, legal, or financial decisions, always consult a qualified professional.

Your Rights & Our Transparency

Under GDPR and CCPA, you have the right to access, correct, and delete personal data. Tech Jacks Solutions does not sell personal data. This article is independently produced. We have no affiliate relationship with Alibaba Cloud or the Qwen team.

AI systems referenced in this article are subject to the EU AI Act and applicable national regulations. Benchmark scores cited are sourced from public third-party leaderboards and are current as of May 2026.