Qwen 3.5: Complete Guide to Alibaba's Flagship Open-Source Model

Qwen 3.5 (also written Qwen3.5) is Alibaba Cloud's latest flagship open-source model, released on February 16, 2026. It's a 397-billion-parameter Mixture-of-Experts vision-language model with only 17B active parameters per token — meaning you get frontier-level performance at a fraction of the compute cost. It ships under the Apache 2.0 license, supports 201 languages, and handles text, images, and video natively in a single unified model. No separate VL variant needed.

Try Qwen 3.5 free — chat.qwen.ai

Official Launch Trailer

Qwen3.5 is here.pic.twitter.com/trailer
— Qwen (@Alibaba_Qwen) February 16, 2026

In This Guide

Technical Specifications
Architecture Deep Dive
Benchmarks vs GPT-5.2, Claude Opus 4.5, Gemini 3 Pro
Multimodal Capabilities
Agentic AI Features
API Pricing
Running Locally
API & Developer Guide
Community Testing Highlights
FAQ

Technical Specifications

Qwen 3.5 is a sparse Mixture-of-Experts (MoE) model that activates only a small fraction of its total parameters for each token. This design gives it the quality of a much larger dense model while keeping inference costs low. Here's the complete spec sheet:

Specification	Value
Total parameters	397 billion
Active parameters per token	17 billion
Architecture	Hybrid Gated DeltaNet + Gated Attention + MoE
Number of layers	60
MoE configuration	512 total experts, 10 routed + 1 shared active
Context window (native)	262,144 tokens
Context window (extended via YaRN)	1,010,000 tokens (~1M)
Vocabulary size	250,000 tokens (69% larger than Qwen 3)
Languages supported	201 languages and dialects
Modalities — input	Text + Image + Video
Modalities — output	Text only
Thinking mode	Built-in (toggle on/off via API)
License	Apache 2.0
HuggingFace	Qwen/Qwen3.5-397B-A17B
Release date	February 16, 2026

For context, the previous Qwen 3 flagship (Qwen3-235B-A22B) used 22B active parameters from 235B total. Qwen 3.5 nearly doubles the total parameter count while reducing active parameters to 17B — a significant efficiency improvement made possible by the new hybrid architecture.

MoE Model Comparison

How does Qwen 3.5 compare to other large MoE models on foundational benchmarks? Here's the full picture:

Benchmark comparison table of Qwen3.5-397B-A17B vs Qwen3-235B-A22B, GLM-4.5-355B, DeepSeek-V3.2-671B, and K2-IT across General Knowledge, Reasoning, STEM, and Coding

Qwen 3.5 leads across most categories despite activating only 17B parameters — fewer than any competitor in this comparison.

Architecture Deep Dive

Qwen 3.5 introduces a hybrid architecture that combines two different attention mechanisms. This is one of the most technically interesting aspects of the model and directly explains its efficiency gains.

Gated DeltaNet (Linear Attention)

The majority of layers use Gated DeltaNet, a linear attention mechanism originally explored in the Qwen 3 Next experimental family. Unlike standard quadratic attention (where compute scales with the square of context length), linear attention scales linearly. This means Qwen 3.5 can process long contexts — up to 1M tokens — without the memory explosion that plagues traditional transformers.

In practical terms: longer prompts consume significantly less GPU memory than they would with a standard attention model of equivalent quality.

Gated Attention (Standard)

A subset of layers still uses traditional gated attention for tasks that benefit from full quadratic attention. The model learns when to use each mechanism, creating a best-of-both-worlds approach: efficient processing for most tokens, with full-power attention where it matters most.

Sparse Mixture of Experts

On top of the hybrid attention, Qwen 3.5 uses an MoE architecture with 512 total experts. For each token, only 11 experts are activated (10 routed + 1 shared), keeping inference fast. The result: the model has access to 397B parameters of knowledge while only running 17B parameters' worth of compute per token.

Unified Vision-Language Model

Previous Qwen vision models required a separate VL (Vision-Language) variant. Qwen 3.5 merges everything into a single checkpoint. Vision capabilities are built in through early fusion — not bolted on as an adapter. This means the same model that writes code, answers questions, and reasons through math can also analyze images and watch videos. For image generation rather than understanding, see Qwen-Image-2.0 — Alibaba's dedicated 7B text-to-image and editing model.

Inference Speed: Decode Throughput

The hybrid architecture pays off massively in inference speed. Thanks to Gated DeltaNet's linear scaling, Qwen 3.5 is dramatically faster than previous Qwen models — especially at long contexts:

Decode throughput comparison showing Qwen3.5-397B-A17B is 8.6x faster than Qwen3-Max at 32K context and 19x faster at 256K context

Qwen 3.5 decode throughput: 8.6× faster at 32K and 19× faster at 256K context vs Qwen3-Max.

Benchmarks vs GPT-5.2, Claude Opus 4.5 & Gemini 3 Pro

Qwen 3.5 was evaluated against the current frontier models across a wide range of benchmarks. The results show it's competitive across the board, with particular strengths in instruction following, multimodal document understanding, and agentic tasks.

Benchmark	Category	Qwen3.5	GPT-5.2	Claude Opus 4.5	Gemini 3 Pro
MMLU-Pro	Knowledge	87.8	87.4	89.5	89.8
GPQA	Doctoral Science	88.4	92.4	87.0	91.9
IFBench	Instruction Following	76.5 🥇	75.4	58.0	70.4
MultiChallenge	Complex Reasoning	67.6 🥇	57.9	54.2	64.2
SWE-bench Verified	Coding	76.4	80.0	80.9	76.2
LiveCodeBench v6	Live Coding	83.6	87.7	84.8	90.7
HLE	Humanity's Last Exam	28.7	35.5	30.8	37.5
MMMU	Multimodal Understanding	85.0	86.7	80.7	87.2
MathVision	Multimodal Math	88.6 🥇	83.0	74.3	86.6
OmniDocBench1.5	Document Understanding	90.8 🥇	85.7	87.7	88.5
BrowseComp	Browser Automation	69.0 🥇	65.8	67.8	59.2
NOVA-63	Agentic Tasks	59.1 🥇	54.6	56.7	56.7
AndroidWorld	Mobile Automation	66.8 🥇	—	—	—

Bar chart comparing Qwen3.5-397B-A17B against GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro across 12 benchmarks including IFBench, GPQA, BrowseComp, MMMU, SWE-bench, and more

Visual comparison across 12 major benchmarks — Qwen 3.5 (purple) leads on instruction following, agentic tasks, and document understanding.

Full Benchmark Details

For a more granular view, here are the complete benchmark tables covering text-only and multimodal evaluations:

Detailed benchmark table for text capabilities: Knowledge, Instruction Following, Long Context, STEM, Reasoning, Agentic, Search Agent, Multilingual, and Coding Agent scores

Text benchmark results — note the dominant performance on Agentic and Search Agent categories.

Detailed benchmark table for multimodal capabilities: STEM and Math, General QA, Text Recognition, Natural Language Understanding, Video Understanding, Visual Agent, and Medical scores

Multimodal benchmark results — leading scores in Visual Agent and Document Understanding categories.

Key Takeaways

Qwen 3.5 leads in: instruction following (IFBench), multimodal math (MathVision), document understanding (OmniDocBench), and all agentic benchmarks (BrowseComp, NOVA-63, AndroidWorld).
Claude Opus 4.5 leads in: real-world coding tasks (SWE-bench Verified at 80.9).
Gemini 3 Pro leads in: competitive programming (LiveCodeBench), HLE, and general multimodal understanding (MMMU).
GPT-5.2 leads in: doctoral-level science (GPQA) and abstract reasoning (ARC-AGI-2 at 54.2%).

Worth noting: Qwen 3.5 outperforms the previous closed-source Qwen3-Max-Thinking on most benchmarks — meaning an open-weight, Apache 2.0 model now exceeds what was Alibaba's best proprietary offering just months ago.

Qwen3.5 benchmarks
— Qwen (@Alibaba_Qwen) February 16, 2026

Multimodal Capabilities

Qwen 3.5 is a unified vision-language model. It processes text, images, and video through a single architecture — no need for a separate VL model. Here's what it can handle:

Image Understanding

Document analysis: Charts, graphs, tables, PDFs, handwritten notes — scores 90.8 on OmniDocBench1.5, the best of any model tested.
Medical imaging: Community testers report strong performance with X-rays, histopathology slides, and dermatology images — producing structured diagnoses with appropriate caveats.
Technical diagrams: Can parse complex numerical graphs and explain them in accessible language.
Creative tasks: SVG generation from reference photos, wireframe-to-code conversion from hand-drawn sketches.

Video Understanding

Accepts video input up to 2 hours in length.
Accurately describes scenes, counts animals/objects, identifies locations, and summarizes action sequences.
Community tests with African savanna footage showed correct animal counts and accurate location guesses (Serengeti, Tanzania / Masai Mara, Kenya).

Code from Vision

One of the most practical multimodal use cases: Qwen 3.5 can take a hand-drawn wireframe and generate functional HTML/CSS/JS. Community testers built complete portfolio websites, browser-based OS interfaces (with working calculators, games, and settings panels), and even 3D simulations — all from text or image prompts.

Agentic AI Features

Alibaba positions Qwen 3.5 as a model for the "agentic AI era." This isn't just marketing — the benchmarks back it up. Qwen 3.5 leads on every agent-focused evaluation:

BrowseComp (browser automation): 69.0 — beats GPT-5.2 (65.8) and Claude Opus 4.5 (67.8).
NOVA-63 (general agentic tasks): 59.1 — top of the field.
AndroidWorld (mobile app automation): 66.8 — a benchmark where competitors haven't even published scores.

In practice, this means Qwen 3.5 can independently interact with mobile and desktop applications, take actions across apps, fill out forms, navigate websites, and complete multi-step workflows. Combined with its vision capabilities, it can literally see what's on screen and act on it.

Graph showing Average Ranking vs Environment Scaling — Qwen3.5-397B-A17B achieves top ranking as training environments increase, surpassing Claude Opus 4.5, GPT-5.2, and Gemini 3 Pro

Agentic scaling: Qwen 3.5 (Thinking) achieves the best average ranking as the number of training environments increases.

Qwen3.5 visual agentic capabilities
— Qwen (@Alibaba_Qwen) February 16, 2026

Tool Use & Function Calling

Qwen 3.5 supports native function calling, structured output (JSON mode), and tool use. The API is OpenAI-compatible, making it a drop-in replacement in many existing workflows. It also powers Qwen Code, Alibaba's developer CLI for delegating coding tasks via natural language.

Introducing Qwen Code
— Qwen (@Alibaba_Qwen) February 16, 2026

API Pricing

Qwen 3.5 is available through Alibaba Cloud's Model Studio as Qwen3.5-Plus. The pricing is aggressive — significantly cheaper than competing frontier models:

Context Range	Input (per 1M tokens)	Output (per 1M tokens)
0–256K tokens	$0.40	$2.40
256K–1M tokens	$1.20	$7.20

Batch mode: 50% discount for asynchronous batch processing.
Thinking mode: Same price regardless of whether thinking is enabled or disabled — no separate thinking surcharge.
Model ID: qwen3.5-plus-2026-02-15

For reference, this pricing is roughly 70% cheaper than GPT-5 series API calls for comparable tasks, and about 60% cheaper than previous Qwen models. In the Chinese market, pricing starts as low as ¥0.8 per million tokens — approximately 1/18th the cost of Gemini 3 Pro.

Qwen 3.5 is also available for free testing on chat.qwen.ai with rate limits.

Running Locally

With 397B total parameters, Qwen 3.5 is a large model — but the MoE architecture means RAM requirements are based on total parameters (all experts must be loaded), not just active parameters. Here's what you need:

Hardware Requirements

Quantization	Approx. Size	Minimum RAM/VRAM	Recommended Hardware
BF16 (full precision)	~780 GB	800+ GB	Multi-GPU server (4–8× A100 80GB)
Q8_0	~400 GB	420+ GB	Mac Studio 512GB / Multi-GPU
Q4_K_XL	~220 GB	240+ GB	Mac Studio 256GB / 2× RTX 5090
Q2_K_XL	~140 GB	160+ GB	Mac Studio 192GB (slow)

Important: the 128GB unified memory Macs (M4 Max, etc.) are not enough to run Qwen 3.5, even at aggressive quantization levels. You need at least 256GB of unified memory, which means a Mac Studio or Mac Pro. For more details on hardware options, see our hardware requirements guide.

Supported Frameworks

HuggingFace Transformers — requires latest version
vLLM ≥ 0.8.5
SGLang ≥ 0.4.6.post1
Ollama — via GGUF format
LM Studio — via GGUF format
llama.cpp — via GGUF format

GGUF quantizations are available from Unsloth on HuggingFace in formats ranging from Q2_K to Q8_0. For a complete local deployment walkthrough, check our Run Qwen Locally guide.

API & Developer Guide

The Qwen 3.5 API is OpenAI-compatible, which means you can use existing OpenAI SDK clients by simply changing the base URL and API key. Here's a quick start:

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DASHSCOPE_API_KEY",
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
)

response = client.chat.completions.create(
    model="qwen3.5-plus-2026-02-15",
    messages=[
        {"role": "user", "content": "Explain quantum entanglement simply."}
    ],
    extra_body={"enable_thinking": True}  # Toggle thinking mode
)

print(response.choices[0].message.content)

Multimodal (Image Input)

response = client.chat.completions.create(
    model="qwen3.5-plus-2026-02-15",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://example.com/chart.png"}},
            {"type": "text", "text": "Explain this chart in simple words."}
        ]
    }]
)

Recommended Sampling Parameters

Mode	Temperature	Top-P	Top-K	Presence Penalty
Thinking mode	0.6	0.95	20	0.0
Standard mode	0.7	0.8	20	1.5

Recommended max output tokens: 32,768 for general queries; up to 81,920 for complex math/coding tasks. The API also supports structured output (JSON mode) and function calling for tool-use workflows.

Community Testing Highlights

Within hours of launch, the community began stress-testing Qwen 3.5 across creative, technical, and practical tasks. Here are the standout results:

Coding & Creative Generation

Browser OS: Generated a fully interactive browser-based operating system with working calculator, snake game, memory game, settings panel with wallpaper customization, and a Matrix-style screensaver — from a single prompt.
3D Games: Created functional 3D games (flight combat simulator, naval combat, jet ski simulator, Cube Runner 3D) in JavaScript without external libraries — using raw math only.
SVG from Photos: Converted real photographs into detailed SVG replicas, capturing colors, shadows, and composition with notably higher fidelity than previous open-source models.
Wireframe to Code: Turned hand-drawn portfolio wireframes into fully functional, styled portfolio websites with iterative improvement (the second attempt was described as significantly better than the first).

Vision & Medical Reasoning

Graph Analysis: Given a complex numerical chart, produced an explanation that was described as "a masterclass in making complex numerical analysis accessible" — breaking down axes, curves, and key takeaways with concrete examples.
Medical Imaging: When given X-rays, histopathology, and dermatology images in a single prompt, produced a structured clinical assessment bridging multiple specialties — radiology, pathology, and lab interpretation — into a coherent diagnostic picture, plus a compassionate draft patient email.
Video Analysis: Accurately described an African savanna scene from video, correctly counting animals, identifying species (lions, zebras, deer), and guessing the filming location as Serengeti/Masai Mara.

Creative Writing

Given an AI-generated image of two people in a room, produced a detailed romance novel outline with character backstories, motivations tied to visual cues in the image, a three-act structure, and even context for the specific cover image moment. Then successfully pivoted to a horror/true-crime version on request.

Testers consistently noted that Qwen 3.5's language quality has improved significantly — responses are more targeted, less embellished, and show better structural organization compared to previous versions.

Qwen ecosystem and community
— Qwen (@Alibaba_Qwen) February 16, 2026

Frequently Asked Questions

Is Qwen 3.5 Plus a different model from Qwen 3.5?

No. Qwen3.5-Plus is simply the hosted API version of the open-weight Qwen3.5-397B-A17B model, available through Alibaba Cloud's Model Studio. It comes with production features like a default 1M context window and built-in tool integration. The underlying model weights are the same.

Can I run Qwen 3.5 on my Mac?

Only if you have a Mac Studio or Mac Pro with at least 256GB unified memory. The 128GB M4 Max systems are too small, even with aggressive quantization. The Q4_K_XL quantization needs ~220GB of storage plus ~240GB of RAM. See our hardware requirements page for details.

How does Qwen 3.5 compare to Qwen 3?

Qwen 3.5 brings three major upgrades: (1) a hybrid attention architecture that dramatically reduces memory usage at long contexts, (2) unified multimodal capabilities — no separate VL model needed, and (3) a 69% larger vocabulary (250K vs 148K tokens) enabling better multilingual performance across 201 languages. It also outperforms the closed-source Qwen3-Max-Thinking on most benchmarks. See our full Qwen 3 guide for the previous generation's specs.

Is fine-tuning available?

Fine-tuning is not yet available through Alibaba Cloud's Model Studio as of launch. However, since the model is Apache 2.0 and weights are on HuggingFace, community fine-tuning with tools like LoRA/QLoRA is possible for those with sufficient hardware.

What's the thinking mode?

Like QwQ and other reasoning models, Qwen 3.5 can output its internal reasoning process in <think>...</think> tags before giving a final answer. This is enabled by default and can be toggled off via the API parameter enable_thinking: false. Both modes cost the same — no pricing premium for thinking.

What languages does it support?

Qwen 3.5 supports 201 languages and dialects, up from 119 in Qwen 3. The expanded 250K-token vocabulary enables 10–60% cost reduction for multilingual applications through more efficient tokenization.

Bottom Line

Qwen 3.5 is a genuinely competitive frontier model that's free to use, free to deploy, and free to modify. It leads on agentic benchmarks, matches or exceeds GPT-5.2 on instruction following and document understanding, and brings unified vision-language capabilities to the open-source world for the first time at this quality level. The aggressive API pricing makes it viable for production workloads, and the Apache 2.0 license means you can self-host without restrictions.

If you're building AI agents, processing documents at scale, or need a capable multimodal model without vendor lock-in — Qwen 3.5 should be at the top of your evaluation list.