Qwen-Plus: Alibaba Cloud's Mid-Tier API Model
Qwen-Plus is Alibaba Cloud's mid-tier API model — the balanced option between the budget-friendly Qwen-Flash and the frontier-grade Qwen-Max. It offers strong reasoning, 1M-token context, and thinking mode at a fraction of the cost of competing frontier APIs. As of February 2026, the Plus tier includes Qwen3.5-Plus — powered by the newly released Qwen 3.5 open-source model — alongside the stable Qwen3-based qwen-plus alias.
In This Guide
What Is Qwen-Plus?
Qwen-Plus is not a specific open-source model — it's an API tier alias on Alibaba Cloud's Model Studio (DashScope). Think of it like OpenAI's "GPT-4o" vs "GPT-4o-mini" — different price/performance tiers pointing to different underlying models. The qwen-plus alias always points to a stable, production-ready snapshot that Alibaba updates periodically.
The key advantage of Qwen-Plus over open-source self-hosting: you get 1M-token context, thinking mode, and production-grade infrastructure without managing GPU servers. The key advantage over Qwen-Max: it's 3× cheaper on input and significantly faster for most tasks.
API Tiers: Flash vs Plus vs Max
Alibaba offers three main API tiers. Understanding the differences helps you pick the right one:
| Tier | API Model ID | Best For | Context | Thinking Mode | Input $/1M |
|---|---|---|---|---|---|
| Qwen-Flash | qwen-flash |
High-volume, low-latency, simple tasks | 1M | No | $0.05 |
| Qwen-Plus ⭐ | qwen-plus |
Enterprise apps, RAG, general reasoning | 1M | Yes | $0.40 |
| Qwen-Max | qwen3-max |
Complex reasoning, math, research | 262K | Yes | $1.20 |
Note: Qwen-Turbo is deprecated. Alibaba recommends migrating to Qwen-Flash for equivalent low-cost use cases.
Qwen-Plus Specifications
| Specification | Value |
|---|---|
| Current stable snapshot | qwen-plus-2025-12-01 (Qwen3-based) |
| Latest snapshot | qwen3.5-plus-2026-02-15 (Qwen3.5-based) |
| Context window | 1,000,000 tokens |
| Max output tokens | 32,768 tokens |
| Thinking mode | Yes (toggle via enable_thinking) |
| Modalities — input | Text (qwen-plus) / Text + Image + Video (qwen3.5-plus) |
| Modalities — output | Text only |
| Languages | 100+ (qwen-plus) / 201 (qwen3.5-plus) |
| Function calling | Yes |
| Structured output (JSON) | Yes |
| OpenAI-compatible API | Yes |
| License | Proprietary (API access only) |
Pricing
Qwen-Plus (stable, Qwen3-based)
| Context Range | Input ($/1M) | Output — Standard ($/1M) | Output — Thinking ($/1M) |
|---|---|---|---|
| 0–256K | $0.40 | $1.20 | $4.00 |
| 256K–1M | $1.20 | $3.60 | $12.00 |
Qwen3.5-Plus (new, Qwen3.5-based)
| Context Range | Input ($/1M) | Output ($/1M) |
|---|---|---|
| 0–256K | $0.40 | $2.40 |
| 256K–1M | $1.20 | $7.20 |
- Batch processing: 50% discount for asynchronous batch jobs.
- Context caching: Implicit cache hits cost 20% of standard input; explicit cache hits cost 10%.
- Free tier: 1 million tokens for new international accounts (valid 90 days).
How does this compare? Qwen-Plus at $0.40/$1.20 per 1M tokens is roughly 5–10× cheaper than GPT-4o and 3× cheaper than Qwen-Max on input costs. For most production workloads that don't require frontier-level reasoning, Plus offers the best cost-to-quality ratio in Alibaba's lineup.
Qwen3.5-Plus — The New Option
Released on February 15, 2026, Qwen3.5-Plus is powered by the Qwen 3.5 open-source model (397B parameters, 17B active). Key differences from the standard qwen-plus:
- Multimodal input: Accepts text, images, and video (standard qwen-plus is text-only).
- 201 languages (vs ~100 in standard qwen-plus).
- Stronger benchmarks: Leads on IFBench, MathVision, OmniDocBench, and all agentic benchmarks.
- No separate thinking surcharge: Same price regardless of thinking mode (standard qwen-plus charges 3.3× more for thinking output).
- Slightly higher output cost: $2.40/M vs $1.20/M for standard mode — but no thinking surcharge makes it cheaper for reasoning tasks.
Which should you use? If you need multimodal capabilities, agentic features, or run heavy reasoning tasks, qwen3.5-plus is the better choice. If you have a stable production pipeline that only needs text processing and you want the lowest cost, stick with qwen-plus until Alibaba updates the alias.
Model IDs
qwen-plus/qwen-plus-latest— Stable alias (currently points to Qwen3-based snapshot)qwen3.5-plus-2026-02-15— Qwen 3.5 specific snapshot
API Quick Start
The API is OpenAI-compatible. You can use the OpenAI Python SDK with a different base URL:
Basic Text Request
from openai import OpenAI
client = OpenAI(
api_key="YOUR_DASHSCOPE_API_KEY",
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)
response = client.chat.completions.create(
model="qwen-plus", # or "qwen3.5-plus-2026-02-15"
messages=[
{"role": "user", "content": "Summarize the key risks in this contract..."}
]
)
print(response.choices[0].message.content)
With Thinking Mode
response = client.chat.completions.create(
model="qwen-plus",
messages=[
{"role": "user", "content": "Solve this optimization problem step by step..."}
],
extra_body={"enable_thinking": True}
)
Multimodal (Qwen3.5-Plus only)
response = client.chat.completions.create(
model="qwen3.5-plus-2026-02-15",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
{"type": "text", "text": "What does this chart show?"}
]
}]
)
Version History
The qwen-plus alias has pointed to different underlying models over time as Alibaba upgrades the backend:
| Snapshot | Base Model | Context | Notes |
|---|---|---|---|
qwen-plus-2025-01-25 | Qwen 2.5 | 131K | Initial Qwen 2.5-era snapshot |
qwen-plus-2025-04-28 | Qwen 3 | 1M | Upgraded to Qwen 3 backbone, 1M context |
qwen-plus-2025-09-11 | Qwen 3 | 1M | Performance improvements |
qwen-plus-2025-12-01 | Qwen 3 | 1M | Current stable alias |
qwen3.5-plus-2026-02-15 | Qwen 3.5 | 1M | Multimodal, 201 languages, new |
When to Use Plus vs Max vs Flash
| Use Case | Recommended Tier | Why |
|---|---|---|
| Customer support chatbot | Flash | Low latency, cheapest option, simple Q&A |
| RAG over long documents | Plus | 1M context, good reasoning, cost-efficient |
| Enterprise analytics/reporting | Plus | Structured output, function calling, balanced cost |
| Image/video analysis | Qwen3.5-Plus | Only Plus-tier option with multimodal input |
| Agentic workflows | Qwen3.5-Plus | Best agentic benchmark scores in Plus tier |
| Math olympiad / competitive coding | Max | Test-time scaling, deepest reasoning |
| Scientific research / doctoral-level Q&A | Max | Highest GPQA, HLE scores |
| High-volume content generation | Flash | 50× cheaper than Plus on input |
FAQ
Is Qwen-Plus the same as Qwen3.5-Plus?
Not exactly. qwen-plus is the stable alias that Alibaba updates periodically (currently Qwen3-based). qwen3.5-plus is a specific named model based on the Qwen 3.5 architecture. They coexist — you can choose either via the API model ID.
Can I run Qwen-Plus locally?
No — Qwen-Plus is API-only. However, the underlying models are open-source. The Qwen3.5-Plus API runs the same Qwen 3.5 model (397B MoE) that's available on HuggingFace under Apache 2.0. You can self-host it if you have the hardware (~256GB+ RAM). See our Run Locally guide.
How does Qwen-Plus compare to GPT-4o?
Qwen-Plus offers comparable quality for most enterprise tasks at 5–10× lower cost. The Qwen3.5-Plus variant adds multimodal capabilities and stronger benchmark scores than GPT-4o on instruction following and document understanding. For tasks requiring frontier reasoning (math, science), Qwen-Max is the better comparison to GPT-5.
What happened to Qwen-Turbo?
Deprecated. Alibaba recommends migrating to Qwen-Flash, which serves the same low-cost, high-speed niche at $0.05/M input tokens.