Qwen-Plus: Alibaba Cloud's Mid-Tier API Model

Qwen-Plus is Alibaba Cloud's mid-tier API model — the balanced option between the budget-friendly Qwen-Flash and the frontier-grade Qwen-Max. It offers strong reasoning, 1M-token context, and thinking mode at a fraction of the cost of competing frontier APIs. As of February 2026, the Plus tier includes Qwen3.5-Plus — powered by the newly released Qwen 3.5 open-source model — alongside the stable Qwen3-based qwen-plus alias.

What Is Qwen-Plus?

Qwen-Plus is not a specific open-source model — it's an API tier alias on Alibaba Cloud's Model Studio (DashScope). Think of it like OpenAI's "GPT-4o" vs "GPT-4o-mini" — different price/performance tiers pointing to different underlying models. The qwen-plus alias always points to a stable, production-ready snapshot that Alibaba updates periodically.

The key advantage of Qwen-Plus over open-source self-hosting: you get 1M-token context, thinking mode, and production-grade infrastructure without managing GPU servers. The key advantage over Qwen-Max: it's 3× cheaper on input and significantly faster for most tasks.

API Tiers: Flash vs Plus vs Max

Alibaba offers three main API tiers. Understanding the differences helps you pick the right one:

Tier API Model ID Best For Context Thinking Mode Input $/1M
Qwen-Flash qwen-flash High-volume, low-latency, simple tasks 1M No $0.05
Qwen-Plus qwen-plus Enterprise apps, RAG, general reasoning 1M Yes $0.40
Qwen-Max qwen3-max Complex reasoning, math, research 262K Yes $1.20

Note: Qwen-Turbo is deprecated. Alibaba recommends migrating to Qwen-Flash for equivalent low-cost use cases.

Qwen-Plus Specifications

Specification Value
Current stable snapshotqwen-plus-2025-12-01 (Qwen3-based)
Latest snapshotqwen3.5-plus-2026-02-15 (Qwen3.5-based)
Context window1,000,000 tokens
Max output tokens32,768 tokens
Thinking modeYes (toggle via enable_thinking)
Modalities — inputText (qwen-plus) / Text + Image + Video (qwen3.5-plus)
Modalities — outputText only
Languages100+ (qwen-plus) / 201 (qwen3.5-plus)
Function callingYes
Structured output (JSON)Yes
OpenAI-compatible APIYes
LicenseProprietary (API access only)

Pricing

Qwen-Plus (stable, Qwen3-based)

Context Range Input ($/1M) Output — Standard ($/1M) Output — Thinking ($/1M)
0–256K$0.40$1.20$4.00
256K–1M$1.20$3.60$12.00

Qwen3.5-Plus (new, Qwen3.5-based)

Context Range Input ($/1M) Output ($/1M)
0–256K$0.40$2.40
256K–1M$1.20$7.20

How does this compare? Qwen-Plus at $0.40/$1.20 per 1M tokens is roughly 5–10× cheaper than GPT-4o and 3× cheaper than Qwen-Max on input costs. For most production workloads that don't require frontier-level reasoning, Plus offers the best cost-to-quality ratio in Alibaba's lineup.

Qwen3.5-Plus — The New Option

Released on February 15, 2026, Qwen3.5-Plus is powered by the Qwen 3.5 open-source model (397B parameters, 17B active). Key differences from the standard qwen-plus:

Which should you use? If you need multimodal capabilities, agentic features, or run heavy reasoning tasks, qwen3.5-plus is the better choice. If you have a stable production pipeline that only needs text processing and you want the lowest cost, stick with qwen-plus until Alibaba updates the alias.

Model IDs

API Quick Start

The API is OpenAI-compatible. You can use the OpenAI Python SDK with a different base URL:

Basic Text Request

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DASHSCOPE_API_KEY",
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen-plus",  # or "qwen3.5-plus-2026-02-15"
    messages=[
        {"role": "user", "content": "Summarize the key risks in this contract..."}
    ]
)

print(response.choices[0].message.content)

With Thinking Mode

response = client.chat.completions.create(
    model="qwen-plus",
    messages=[
        {"role": "user", "content": "Solve this optimization problem step by step..."}
    ],
    extra_body={"enable_thinking": True}
)

Multimodal (Qwen3.5-Plus only)

response = client.chat.completions.create(
    model="qwen3.5-plus-2026-02-15",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
            {"type": "text", "text": "What does this chart show?"}
        ]
    }]
)

Version History

The qwen-plus alias has pointed to different underlying models over time as Alibaba upgrades the backend:

Snapshot Base Model Context Notes
qwen-plus-2025-01-25Qwen 2.5131KInitial Qwen 2.5-era snapshot
qwen-plus-2025-04-28Qwen 31MUpgraded to Qwen 3 backbone, 1M context
qwen-plus-2025-09-11Qwen 31MPerformance improvements
qwen-plus-2025-12-01Qwen 31MCurrent stable alias
qwen3.5-plus-2026-02-15Qwen 3.51MMultimodal, 201 languages, new

When to Use Plus vs Max vs Flash

Use Case Recommended Tier Why
Customer support chatbotFlashLow latency, cheapest option, simple Q&A
RAG over long documentsPlus1M context, good reasoning, cost-efficient
Enterprise analytics/reportingPlusStructured output, function calling, balanced cost
Image/video analysisQwen3.5-PlusOnly Plus-tier option with multimodal input
Agentic workflowsQwen3.5-PlusBest agentic benchmark scores in Plus tier
Math olympiad / competitive codingMaxTest-time scaling, deepest reasoning
Scientific research / doctoral-level Q&AMaxHighest GPQA, HLE scores
High-volume content generationFlash50× cheaper than Plus on input

FAQ

Is Qwen-Plus the same as Qwen3.5-Plus?

Not exactly. qwen-plus is the stable alias that Alibaba updates periodically (currently Qwen3-based). qwen3.5-plus is a specific named model based on the Qwen 3.5 architecture. They coexist — you can choose either via the API model ID.

Can I run Qwen-Plus locally?

No — Qwen-Plus is API-only. However, the underlying models are open-source. The Qwen3.5-Plus API runs the same Qwen 3.5 model (397B MoE) that's available on HuggingFace under Apache 2.0. You can self-host it if you have the hardware (~256GB+ RAM). See our Run Locally guide.

How does Qwen-Plus compare to GPT-4o?

Qwen-Plus offers comparable quality for most enterprise tasks at 5–10× lower cost. The Qwen3.5-Plus variant adds multimodal capabilities and stronger benchmark scores than GPT-4o on instruction following and document understanding. For tasks requiring frontier reasoning (math, science), Qwen-Max is the better comparison to GPT-5.

What happened to Qwen-Turbo?

Deprecated. Alibaba recommends migrating to Qwen-Flash, which serves the same low-cost, high-speed niche at $0.05/M input tokens.