Qwen-Plus: Alibaba Cloud's Mid-Tier API Model

Qwen-Plus is Alibaba Cloud's mid-tier API model — the balanced option between the budget-friendly Qwen-Flash and the frontier-grade Qwen-Max. It offers strong reasoning, 1M-token context, and thinking mode at a fraction of the cost of competing frontier APIs. As of February 2026, the Plus tier includes Qwen3.5-Plus — powered by the newly released Qwen 3.5 open-source model — alongside the stable Qwen3-based qwen-plus alias.

Get API Key — Alibaba Cloud Model Studio

In This Guide

What Is Qwen-Plus?
API Tiers: Flash vs Plus vs Max
Specifications
Pricing
Qwen3.5-Plus (New)
API Quick Start
Version History
When to Use Plus vs Max vs Flash
FAQ

What Is Qwen-Plus?

Qwen-Plus is not a specific open-source model — it's an API tier alias on Alibaba Cloud's Model Studio (DashScope). Think of it like OpenAI's "GPT-4o" vs "GPT-4o-mini" — different price/performance tiers pointing to different underlying models. The qwen-plus alias always points to a stable, production-ready snapshot that Alibaba updates periodically.

The key advantage of Qwen-Plus over open-source self-hosting: you get 1M-token context, thinking mode, and production-grade infrastructure without managing GPU servers. The key advantage over Qwen-Max: it's 3× cheaper on input and significantly faster for most tasks.

API Tiers: Flash vs Plus vs Max

Alibaba offers three main API tiers. Understanding the differences helps you pick the right one:

Tier	API Model ID	Best For	Context	Thinking Mode	Input $/1M
Qwen-Flash	`qwen-flash`	High-volume, low-latency, simple tasks	1M	No	$0.05
Qwen-Plus ⭐	`qwen-plus`	Enterprise apps, RAG, general reasoning	1M	Yes	$0.40
Qwen-Max	`qwen3-max`	Complex reasoning, math, research	262K	Yes	$1.20

Note: Qwen-Turbo is deprecated. Alibaba recommends migrating to Qwen-Flash for equivalent low-cost use cases.

Qwen-Plus Specifications

Specification	Value
Current stable snapshot	`qwen-plus-2025-12-01` (Qwen3-based)
Latest snapshot	`qwen3.5-plus-2026-02-15` (Qwen3.5-based)
Context window	1,000,000 tokens
Max output tokens	32,768 tokens
Thinking mode	Yes (toggle via `enable_thinking`)
Modalities — input	Text (qwen-plus) / Text + Image + Video (qwen3.5-plus)
Modalities — output	Text only
Languages	100+ (qwen-plus) / 201 (qwen3.5-plus)
Function calling	Yes
Structured output (JSON)	Yes
OpenAI-compatible API	Yes
License	Proprietary (API access only)

Pricing

Qwen-Plus (stable, Qwen3-based)

Context Range	Input ($/1M)	Output — Standard ($/1M)	Output — Thinking ($/1M)
0–256K	$0.40	$1.20	$4.00
256K–1M	$1.20	$3.60	$12.00

Qwen3.5-Plus (new, Qwen3.5-based)

Context Range	Input ($/1M)	Output ($/1M)
0–256K	$0.40	$2.40
256K–1M	$1.20	$7.20

Batch processing: 50% discount for asynchronous batch jobs.
Context caching: Implicit cache hits cost 20% of standard input; explicit cache hits cost 10%.
Free tier: 1 million tokens for new international accounts (valid 90 days).

How does this compare? Qwen-Plus at $0.40/$1.20 per 1M tokens is roughly 5–10× cheaper than GPT-4o and 3× cheaper than Qwen-Max on input costs. For most production workloads that don't require frontier-level reasoning, Plus offers the best cost-to-quality ratio in Alibaba's lineup.

Qwen3.5-Plus — The New Option

Released on February 15, 2026, Qwen3.5-Plus is powered by the Qwen 3.5 open-source model (397B parameters, 17B active). Key differences from the standard qwen-plus:

Multimodal input: Accepts text, images, and video (standard qwen-plus is text-only).
201 languages (vs ~100 in standard qwen-plus).
Stronger benchmarks: Leads on IFBench, MathVision, OmniDocBench, and all agentic benchmarks.
No separate thinking surcharge: Same price regardless of thinking mode (standard qwen-plus charges 3.3× more for thinking output).
Slightly higher output cost: $2.40/M vs $1.20/M for standard mode — but no thinking surcharge makes it cheaper for reasoning tasks.

Which should you use? If you need multimodal capabilities, agentic features, or run heavy reasoning tasks, qwen3.5-plus is the better choice. If you have a stable production pipeline that only needs text processing and you want the lowest cost, stick with qwen-plus until Alibaba updates the alias.

Model IDs

qwen-plus / qwen-plus-latest — Stable alias (currently points to Qwen3-based snapshot)
qwen3.5-plus-2026-02-15 — Qwen 3.5 specific snapshot

API Quick Start

The API is OpenAI-compatible. You can use the OpenAI Python SDK with a different base URL:

Basic Text Request

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DASHSCOPE_API_KEY",
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",  # or "qwen3.5-plus-2026-02-15"
    messages=[
        {"role": "user", "content": "Summarize the key risks in this contract..."}
    ]
)

print(response.choices[0].message.content)

With Thinking Mode

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "user", "content": "Solve this optimization problem step by step..."}
    ],
    extra_body={"enable_thinking": True}
)

Multimodal (Qwen3.5-Plus only)

response = client.chat.completions.create(
    model="qwen3.5-plus-2026-02-15",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
            {"type": "text", "text": "What does this chart show?"}
        ]
    }]
)

Version History

The qwen-plus alias has pointed to different underlying models over time as Alibaba upgrades the backend:

Snapshot	Base Model	Context	Notes
`qwen-plus-2025-01-25`	Qwen 2.5	131K	Initial Qwen 2.5-era snapshot
`qwen-plus-2025-04-28`	Qwen 3	1M	Upgraded to Qwen 3 backbone, 1M context
`qwen-plus-2025-09-11`	Qwen 3	1M	Performance improvements
`qwen-plus-2025-12-01`	Qwen 3	1M	Current stable alias
`qwen3.5-plus-2026-02-15`	Qwen 3.5	1M	Multimodal, 201 languages, new

When to Use Plus vs Max vs Flash

Use Case	Recommended Tier	Why
Customer support chatbot	Flash	Low latency, cheapest option, simple Q&A
RAG over long documents	Plus	1M context, good reasoning, cost-efficient
Enterprise analytics/reporting	Plus	Structured output, function calling, balanced cost
Image/video analysis	Qwen3.5-Plus	Only Plus-tier option with multimodal input
Agentic workflows	Qwen3.5-Plus	Best agentic benchmark scores in Plus tier
Math olympiad / competitive coding	Max	Test-time scaling, deepest reasoning
Scientific research / doctoral-level Q&A	Max	Highest GPQA, HLE scores
High-volume content generation	Flash	50× cheaper than Plus on input

FAQ

Is Qwen-Plus the same as Qwen3.5-Plus?

Not exactly. qwen-plus is the stable alias that Alibaba updates periodically (currently Qwen3-based). qwen3.5-plus is a specific named model based on the Qwen 3.5 architecture. They coexist — you can choose either via the API model ID.

Can I run Qwen-Plus locally?

No — Qwen-Plus is API-only. However, the underlying models are open-source. The Qwen3.5-Plus API runs the same Qwen 3.5 model (397B MoE) that's available on HuggingFace under Apache 2.0. You can self-host it if you have the hardware (~256GB+ RAM). See our Run Locally guide.

How does Qwen-Plus compare to GPT-4o?

Qwen-Plus offers comparable quality for most enterprise tasks at 5–10× lower cost. The Qwen3.5-Plus variant adds multimodal capabilities and stronger benchmark scores than GPT-4o on instruction following and document understanding. For tasks requiring frontier reasoning (math, science), Qwen-Max is the better comparison to GPT-5.

What happened to Qwen-Turbo?

Deprecated. Alibaba recommends migrating to Qwen-Flash, which serves the same low-cost, high-speed niche at $0.05/M input tokens.