Qwen Plus
The Qwen Plus line is Alibaba's production API sweet spot — strong enough for serious work, cheap enough to scale. Three generations are active right now: Qwen3.6-Plus-Preview (the newest, with 1M context and always-on reasoning, currently free), Qwen3.5-Plus (the stable production pick), and the legacy qwen-plus alias. This guide covers all three so you can pick the right one for your workload.
In This Guide
Qwen3.6-Plus-Preview — Free, 1M Context, Always-On Reasoning
Alibaba dropped Qwen3.6-Plus-Preview on March 30, 2026, and it's the clearest signal yet that the "3.6" generation is coming. The model is available on OpenRouter right now under qwen/qwen3.6-plus-preview — and during the preview period, it costs exactly zero dollars. Free input, free output, no catch beyond the fact that Alibaba collects prompt data to improve the model.
The headline numbers: 1 million tokens of context and 65,536 max output tokens. Context is 4x what the previous Qwen3.5-Plus offered at launch (262K), putting it on par with Google's Gemini in terms of raw window size. But the real story isn't the context — it's the reasoning architecture.
Always-On Chain-of-Thought
Unlike Qwen3.5-Plus, which lets you toggle thinking mode on or off via an enable_thinking parameter, the 3.6 Preview uses always-on chain-of-thought. The model reasons through every request internally. You don't flip a switch. It just thinks.
Alibaba says this approach reduces "overthinking" — that frustrating pattern where models burn tokens deliberating over simple questions. Early community testing backs this up. On straightforward tasks, 3.6 Preview responds quickly without the padding you'd see from a forced-thinking model. On complex problems, it reasons deeply without you having to remember to enable a flag.
What It's Good At
Community reports from the first days point to strong performance in agentic coding, front-end programming, and general reasoning tasks. Several developers on OpenRouter forums noted output speeds roughly 3x faster than Claude Opus 4.6, though exact benchmarks from Alibaba haven't been published for this preview.
That speed advantage matters for interactive use. If you're building an agentic workflow where the model needs to iterate — writing code, checking it, revising — faster output directly translates to shorter feedback loops.
The Caveats
This is a preview. Not production-grade, not guaranteed stable, and Alibaba explicitly collects your prompts during this phase. Don't send sensitive data through it. Don't build a production pipeline on top of it. It's a test drive — treat it that way.
There are no published benchmarks from Alibaba yet, which makes it hard to do apples-to-apples comparisons with 3.5-Plus or competing models. The "3x faster than Opus" claim comes from community reports, not controlled testing. And the model is text-only — no image or video input, unlike the multimodal Qwen3.5-Plus.
Bottom line: If you want to test what's coming next in the Qwen ecosystem without spending a cent, 3.6-Plus-Preview is your playground. For anything that needs reliability or multimodal input, stick with 3.5-Plus below.
Qwen3.5-Plus — The Production Workhorse
Released February 15, 2026, Qwen3.5-Plus runs on the Qwen 3.5 architecture — a 397B-parameter MoE model that activates only 17B parameters per token. That efficiency is what makes it both capable and affordable at scale.
The spec sheet reads like a frontier model: 1M token context, 65K max output, multimodal input (text, images, and video), support for 201 languages, and full function calling with structured JSON output. All of that through an OpenAI-compatible API. If you're migrating from GPT-4o, the integration work is minimal — swap the base URL and API key, adjust the model ID, and you're running.
Pricing That Undercuts Everyone
$0.26 per million input tokens. $1.56 per million output tokens. Those are the numbers on Alibaba Cloud's international dashboard. For context, that's 12x cheaper than Anthropic's Claude Sonnet 4.5 on input ($3.00/M) and roughly 10x cheaper on output ($15.00/M). It's even cheaper than OpenAI's GPT-4o at $2.50/$10.00 per million.
Batch processing drops another 50% off those prices. Context caching gives you implicit cache hits at 20% of standard input cost and explicit cache hits at 10%. For RAG pipelines that repeatedly query the same document set, effective costs can be a fraction of the listed rates.
Thinking Modes — Choose Your Reasoning Level
Unlike the 3.6 Preview's always-on approach, Qwen3.5-Plus gives you a toggle. Set enable_thinking: true in your API call and the model reasons step-by-step before answering. Leave it off for faster, cheaper responses on simpler tasks. Same price either way — Qwen3.5-Plus doesn't charge a thinking surcharge like the legacy qwen-plus does.
Where It Falls Short
Qwen3.5-Plus isn't trying to be Qwen-Max. On the hardest reasoning tasks — math olympiad problems, doctoral-level science questions, competitive programming — Max wins. If your workload demands frontier-level depth, Plus isn't the tier for you. It's built for the 90% of production use cases where "very good reasoning at low cost" beats "best-in-class reasoning at 5x the price."
There's also the question of speed. While 3.5-Plus is fast for its capability level, it doesn't match the throughput of Qwen-Flash for simple, high-volume tasks like classification or extraction. Pick the tier that matches your task complexity — don't overpay for reasoning you don't need.
Qwen-Plus Legacy (qwen-plus Alias)
The qwen-plus API alias still works and currently points to a Qwen3-based snapshot (qwen-plus-2025-12-01). It's the cheapest option in the Plus family at $0.13/M input and $0.78/M output for standard mode, though thinking output jumps to $4.00/M in the 0-256K range.
If you've been running qwen-plus in production and it works for your use case, there's no urgent reason to switch. But for new projects, Qwen3.5-Plus is the better starting point — stronger benchmarks, multimodal support, no thinking surcharge, and Alibaba will eventually update the alias to point to it anyway.
Version History
| Snapshot | Base Model | Context | Notes |
|---|---|---|---|
qwen3.6-plus-preview | Qwen 3.6 (preview) | 1M | Free on OpenRouter, always-on reasoning |
qwen3.5-plus-2026-02-15 | Qwen 3.5 | 1M | Multimodal, 201 languages, production-ready |
qwen-plus-2025-12-01 | Qwen 3 | 1M | Current stable alias target |
qwen-plus-2025-09-11 | Qwen 3 | 1M | Performance improvements |
qwen-plus-2025-04-28 | Qwen 3 | 1M | First 1M context snapshot |
qwen-plus-2025-01-25 | Qwen 2.5 | 131K | Initial Qwen 2.5-era snapshot |
Three Generations Compared
Here's the full side-by-side. The right choice depends on whether you're testing, building for production, or running an existing pipeline on a tight budget.
| Feature | 3.6-Plus-Preview | 3.5-Plus | Plus Legacy |
|---|---|---|---|
| Status | Preview (not for production) | Stable / Production | Stable / Legacy |
| Context Window | 1,000,000 tokens | 1,000,000 tokens | 1,000,000 tokens |
| Max Output | 65,536 tokens | 65,536 tokens | 32,768 tokens |
| Reasoning | Always-on CoT | Toggle (enable_thinking) | Toggle (with surcharge) |
| Multimodal Input | Text only | Text + Image + Video | Text only |
| Input Price ($/M) | Free | $0.26 | $0.13 |
| Output Price ($/M) | Free | $1.56 | $0.78 |
| Best For | Testing, evaluation, prototyping | Production workloads | Budget text-only pipelines |
| Where to Access | OpenRouter | Alibaba Cloud, OpenRouter | Alibaba Cloud |
Our recommendation: Start with 3.6-Plus-Preview if you're evaluating Qwen for the first time — it's free and powerful. Move to 3.5-Plus when you need production stability, multimodal input, or can't afford prompt data collection. Keep legacy qwen-plus only if you have an existing pipeline that works and the lowest possible cost matters more than features.
Qwen Plus vs GPT-4o, Sonnet, Gemini — Pricing Showdown
This is where the Plus line gets interesting. Alibaba's pricing is aggressive — not just "a bit cheaper" aggressive, but "an order of magnitude" aggressive. Here's how the production-tier models stack up:
| Provider | Model | Input ($/M tokens) | Output ($/M tokens) | Context |
|---|---|---|---|---|
| Qwen | 3.6-Plus-Preview | Free | Free | 1M |
| Qwen | 3.5-Plus | $0.26 | $1.56 | 1M |
| Gemini 2.5 Flash | $0.075 | $0.30 | 1M | |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | 128K |
| OpenAI | GPT-4o | $2.50 | $10.00 | 128K |
| Anthropic | Claude Sonnet 4.5 | $3.00 | $15.00 | 200K |
The numbers speak for themselves. Qwen3.5-Plus at $0.26/$1.56 is roughly 12x cheaper than Claude Sonnet 4.5 on input and nearly 10x cheaper on output. Compared to GPT-4o, it's about 10x cheaper on input while offering 8x the context window (1M vs 128K).
Google's Gemini 2.5 Flash is the closest competitor on price, and it actually wins on raw per-token cost. But Flash is positioned as a lighter model — closer to GPT-4o-mini territory than GPT-4o. Qwen3.5-Plus targets the mid-tier reasoning bracket where you'd normally reach for GPT-4o or Sonnet, but at a price that makes high-volume use viable.
There's a real caveat here: cheaper doesn't always mean better value. Claude Sonnet and GPT-4o have mature ecosystems, battle-tested reliability, extensive documentation, and broader community tooling. Qwen's API ecosystem is younger. If your team is already invested in OpenAI's or Anthropic's toolchain, switching has costs beyond per-token pricing. But if you're starting fresh or building cost-sensitive applications — especially at scale — the math is hard to ignore.
API Quick Start
All Qwen Plus models use an OpenAI-compatible API. If you've used the OpenAI Python SDK before, this will feel familiar.
Alibaba Cloud (Qwen3.5-Plus, Production)
from openai import OpenAI
client = OpenAI(
api_key="YOUR_DASHSCOPE_API_KEY",
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen3.5-plus-2026-02-15",
messages=[
{"role": "user", "content": "Compare the trade-offs between microservices and monoliths for a 5-person startup."}
]
)
print(response.choices[0].message.content)
OpenRouter (Qwen3.6-Plus-Preview, Free)
from openai import OpenAI
client = OpenAI(
api_key="YOUR_OPENROUTER_KEY",
base_url="https://openrouter.ai/api/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3.6-plus-preview",
messages=[
{"role": "user", "content": "Write a Python script that scrapes HN front page and summarizes each article."}
]
)
print(response.choices[0].message.content)
With Thinking Mode (Qwen3.5-Plus)
response = client.chat.completions.create(
model="qwen3.5-plus-2026-02-15",
messages=[
{"role": "user", "content": "Solve this step by step: If a train leaves Chicago at 60mph..."}
],
extra_body={"enable_thinking": True}
)
Multimodal — Image Analysis (Qwen3.5-Plus Only)
response = client.chat.completions.create(
model="qwen3.5-plus-2026-02-15",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/dashboard.png"}},
{"type": "text", "text": "What metrics are trending down in this dashboard?"}
]
}]
)
Need an API key? Sign up at Alibaba Cloud Model Studio for production access, or grab a free OpenRouter key to test the 3.6 preview. Both platforms offer generous free tiers for new accounts.
When to Use Plus vs Max vs Flash
Alibaba offers three main API tiers beyond the Plus variants. Picking the wrong one wastes money or leaves performance on the table. Here's the quick decision matrix:
| Use Case | Recommended | Why |
|---|---|---|
| Chatbot, simple Q&A, classification | Flash | Fastest, cheapest ($0.05/M input). Don't pay for reasoning you won't use. |
| RAG over long documents | Qwen3.5-Plus | 1M context, solid reasoning, great cost-per-query for retrieval workloads. |
| Enterprise analytics, reporting | Qwen3.5-Plus | Function calling + structured JSON output at scale pricing. |
| Image/video understanding | Qwen3.5-Plus | Only mid-tier model with multimodal input. Max doesn't do vision. |
| Agentic coding workflows | 3.6-Plus-Preview | Free, fast, strong at code. Ideal for testing agentic loops. |
| Evaluating Qwen for the first time | 3.6-Plus-Preview | Zero cost. Kick the tires before committing budget. |
| Math competition, hard science | Max | Deepest reasoning, best GPQA/HLE scores. Worth the premium for hard tasks. |
| Bulk content generation | Flash | At $0.05/M input, you can process millions of tokens affordably. |
| Budget text pipeline (no multimodal) | Plus Legacy | $0.13/M input — cheapest reasoning-capable option. |
Still not sure? If your tasks involve images or video, the answer is always Qwen3.5-Plus — it's the only mid-tier option with multimodal support. For text-only work, start with the free 3.6 preview to gauge quality, then decide between 3.5-Plus (features + stability) and legacy Plus (lowest cost). For local deployment instead of API, check our Run Locally guide or the Can I Run Qwen tool.
FAQ
Should I use 3.6-Plus-Preview or Qwen3.5-Plus?
For testing and evaluation, 3.6 Preview — it's free and fast. For production workloads, 3.5-Plus. The preview collects your prompt data, lacks stability guarantees, and doesn't support multimodal input. Don't build customer-facing products on it.
Is Qwen-Plus better than Qwen-Max?
Different tools for different jobs. Max is more powerful on hard reasoning tasks — math, science, competitive programming. Plus is significantly cheaper and handles 90% of production workloads just as well. If you're not sure, start with Plus and only upgrade to Max if you hit quality ceilings on your specific task.
Can I run Qwen Plus models locally?
The API models themselves are cloud-only. However, Qwen3.5-Plus runs the same Qwen 3.5 architecture (397B MoE) that's open-source under Apache 2.0. You can self-host it with enough hardware (~256GB+ RAM). For smaller local options, see our Run Locally guide or check if your GPU can handle it.
When will Qwen 3.6 be fully released?
No official date from Alibaba yet. The 3.6-Plus-Preview is the first public model from the 3.6 generation. Based on past patterns (Qwen 3.5 went from preview to stable in about 4-6 weeks), a full release could come in late April or May 2026 — but that's speculation, not a confirmed timeline.
How does Qwen-Plus compare to GPT-4o?
Qwen3.5-Plus matches GPT-4o on most enterprise tasks — document understanding, instruction following, structured output — at roughly 10x lower cost and with 8x more context (1M vs 128K). GPT-4o has a more mature ecosystem and broader third-party tool support. For coding-heavy workloads, benchmark carefully on your specific tasks before committing.
What happened to Qwen-Turbo?
Deprecated. Alibaba recommends Qwen-Flash as the replacement — same low-cost, high-speed niche at $0.05/M input tokens.