Qwen API Pricing

Qwen's API pricing sits 3 to 60 times below what OpenAI, Anthropic, and Google charge for comparable models. The cheapest text model — qwen3.5-flash — costs $0.10 per million input tokens with a full 1M context window. The flagship qwen3.5-plus runs at $0.40/$2.40, roughly 12x cheaper than Claude Opus 4.6 and 6x cheaper than GPT-4o. And unlike Claude or Gemini, Qwen doesn't slap a surcharge on long-context requests — you pay the same rate whether you send 10K or 900K tokens.

This page covers every pricing tier across all Qwen API models, head-to-head cost comparisons against major providers, the Coding Plan subscription, and cheaper alternatives through OpenRouter. All prices are in USD, sourced from Alibaba Cloud's International (Singapore) region as of March 2026.

Text Generation Models

Alibaba splits text models into three tiers: flagship, mid-range, and budget. Pricing on some models is tiered by input length — you pay more per token as context grows. The table below shows base-tier prices (shortest context bracket). Tiered details are noted where applicable.

Model Input / 1M tokens Output / 1M tokens Context Max Output
qwen3.5-plus $0.40 $2.40 1M 65K
qwen3-max $1.20 $6.00 252K 16K
qwen3-max-thinking $0.78 $3.90 262K --
qwen-plus $0.40 $1.20 1M 16K
qwen3.5-flash $0.10 $0.40 1M 65K
qwen-flash $0.05 $0.40 1M 8K
qwen3.6-plus-preview FREE $0.00 $0.00 -- --

Tiered pricing note: qwen3.5-plus jumps to $0.50/$3.00 above 256K tokens. qwen3-max scales from $1.20 (0-32K) to $2.40 (32-128K) to $3.00 (128-252K) on input, with output scaling proportionally. qwen-plus doubles above 128K. Even at the highest tier, these prices stay well below what competitors charge at their base rates.

For reasoning workloads specifically, qwq-plus costs $0.80/$2.40 with 131K context — significantly cheaper than OpenAI's o3 ($2.00/$8.00) or o4-mini ($1.10/$4.40).

Coding Models

Qwen's dedicated coding models run the gamut from a premium 480B-parameter behemoth to a dirt-cheap option that undercuts most budget models. If you're piping code through an IDE integration or CI pipeline, these matter more than the general text models.

Model Input / 1M tokens Output / 1M tokens Context
qwen3-coder-plus (480B) $0.65 $3.25 1M
qwen3-coder-flash (30B) $0.30 $1.50 262K
qwen3-coder-next $0.07 $0.30 262K

qwen3-coder-next at $0.07/$0.30 is worth highlighting. That's cheaper than GPT-4o-mini for a model purpose-built for code. For high-volume automated coding tasks — linting, refactoring, test generation — it's hard to beat on cost.

Multimodal, Image, Embedding, and Other Models

No other single provider covers this many modalities from one API. Text, code, vision, audio, speech synthesis, translation, image generation, video generation, embeddings — all under the same Alibaba Cloud account. Here's what the specialized models cost.

Multimodal (Omni)

Model Text In / 1M Audio In / 1M Text Out / 1M
qwen3-omni-flash $0.43 $3.81 $1.66
qwen-omni-turbo $0.07 $4.44 $0.27
qwen3.5-omni-plus Pricing TBD — available via DashScope API
qwen3.5-omni-flash Pricing TBD — available via DashScope API
qwen3.5-omni-light Pricing TBD — available via DashScope API

Image Generation

Model Price per Image
qwen-image-2.0-pro $0.075
qwen-image-2.0 $0.035
wan2.6-t2i $0.03

Embeddings

Model Price / 1M tokens Notes
text-embedding-v4 $0.07 Latest, flexible dimensions
text-embedding-v3 $0.07 1024d, 8K token limit, 50+ languages

Translation

Model Input / 1M tokens Output / 1M tokens
qwen-mt-plus $2.46 $7.37
qwen-mt-flash $0.16 $0.49
qwen-mt-lite $0.12 $0.36

Speech models (qwen3-asr-flash, qwen3-tts-flash) and video generation (wan2.6-t2v) are also available. TTS runs roughly $0.013 per 1K characters with a free 1M character quota for new users.

Qwen vs OpenAI vs Anthropic vs Google vs DeepSeek

This is the table that matters. Every major API provider, side by side, sorted by output cost. The price gap isn't subtle — it's an order of magnitude for most tiers.

Provider Model Input / 1M Output / 1M Context
Qwen qwen3.5-flash $0.10 $0.40 1M
DeepSeek V3 $0.14 $0.28 128K
OpenAI GPT-4o-mini $0.15 $0.60 128K
Google Gemini 2.5 Flash-Lite $0.10 $0.40 --
Qwen qwen3.5-plus $0.40 $2.40 1M
DeepSeek R1 $0.55 $2.19 128K
Google Gemini 2.5 Flash $0.30 $2.50 --
Qwen qwen3-max $1.20 $6.00 252K
OpenAI GPT-4.1 $2.00 $8.00 --
OpenAI o3 $2.00 $8.00 --
OpenAI GPT-4o $2.50 $10.00 128K
Google Gemini 2.5 Pro $1.25 $10.00 200K
Anthropic Claude Sonnet 4.6 $3.00 $15.00 200K
OpenAI GPT-5.4 $2.50 $15.00 --
Anthropic Claude Opus 4.6 $5.00 $25.00 1M

A few comparisons that jump out:

DeepSeek V3 does beat Qwen on raw output price ($0.28 vs $0.40 for Flash). But Qwen3.5-flash offers a 1M context window versus DeepSeek's 128K, and Qwen's model ecosystem is far broader — see our full Qwen vs DeepSeek breakdown for the detailed comparison.

No Long-Context Surcharge

This deserves its own callout. Both Anthropic and Google double their prices when you exceed certain context thresholds:

If your use case involves large documents, long conversations, or RAG pipelines that push past 200K tokens, the savings with Qwen compound dramatically. You're not just paying less per token — you're avoiding the penalty that other providers impose for doing exactly what a 1M-context model is designed to do.

Alibaba Cloud Coding Plan

For developers who use AI coding assistants daily, the Coding Plan is a flat-rate subscription that bundles multiple models under one monthly fee. It's designed to work with Qwen Code, Claude Code, Cursor, Cline, and OpenCode.

Plan Price Monthly Requests Weekly Cap 5-Hour Cap
Lite $10/mo 18,000 9,000 1,200
Pro $50/mo 90,000 45,000 6,000

Models included: qwen3.5-plus, qwen3-max, qwen3-coder-next, qwen3-coder-plus, plus third-party models like kimi-k2.5, glm-5, and MiniMax-M2.5. That's a meaningful lineup — you get access to both Qwen's flagship and coding-specialized models alongside non-Alibaba alternatives.

The math works out well for the Pro plan. At $50/month for 90K requests, you're paying roughly $0.00056 per request. If each request averages 1K output tokens, that's $0.56 per million output tokens — cheaper than even the pay-as-you-go rate for qwen3.5-flash. For heavy Claude Code or Cursor users who burn through thousands of requests daily, this is significantly cheaper than Anthropic's $20/month Claude Pro (which caps you much more aggressively) or OpenAI's $200/month ChatGPT Pro.

Important: The Lite plan stopped accepting new subscribers on March 20, 2026. The Pro plan is still available, with a 50% discount for the first month ($25 instead of $50).

OpenRouter: 35-40% Cheaper Than Direct

OpenRouter acts as a unified API gateway that routes to multiple providers. For Qwen models specifically, their prices undercut Alibaba Cloud's direct pricing by a significant margin — typically 35-40% cheaper.

Model Direct (Alibaba) OpenRouter Savings
qwen3.5-plus (input) $0.40 $0.26 35%
qwen3.5-plus (output) $2.40 $1.56 35%
qwen3.5-flash (input) $0.10 $0.065 35%
qwen3.5-flash (output) $0.40 $0.26 35%
qwen3-coder-next (input) $0.07 $0.12 --

One exception: qwen3-coder-next is actually cheaper direct from Alibaba than through OpenRouter. Check both before committing to a provider.

OpenRouter also gives you access to open-weight Qwen models hosted by third-party inference providers — Qwen3.5-397B-A17B at $0.39/$2.34, for example, or smaller models like the 9B and 27B variants at even lower prices. If you want the absolute cheapest path to Qwen without running hardware yourself, OpenRouter is the move for most models.

Free Tiers and Zero-Cost Options

Before you spend anything, Qwen offers several genuinely free paths:

API free quotas: Most models on the International (Singapore) deployment include 1 million tokens free for 90 days after activation. This applies to qwen3.5-plus, qwen3.5-flash, qwen3-max, qwen-plus, qwen-flash, qwq-plus, and the omni models. That's enough to build a prototype or seriously evaluate the models before paying a cent.

Qwen Chat: chat.qwen.ai is completely free with no token limits for conversational use. It runs Qwen's flagship models in a ChatGPT-style interface. For personal use and quick tasks, this costs nothing.

Run locally for free: Every open-weight Qwen model — from the 0.6B up to the 397B MoE — can be downloaded and run on your own hardware at zero ongoing cost. The 9B fits comfortably on a single GPU with 8GB+ VRAM. The 35B-A3B MoE activates only 3B parameters per forward pass, so it runs on surprisingly modest hardware. Use our Can I Run Qwen tool to check what your specific setup can handle.

qwen3.6-plus-preview on OpenRouter: Alibaba's next-gen Qwen Plus preview is currently completely free on OpenRouter during its evaluation period. There's no token cost — Alibaba collects usage data in exchange. It won't last forever, but right now it's a zero-cost way to test a frontier-class model with no strings attached.

Batch API requests also get a 50% discount off standard pricing across all models, which is worth knowing if you're processing large datasets offline.

Frequently Asked Questions

What's the absolute cheapest way to use Qwen?

Run it locally. Every open-weight model is free forever — no API costs, no token limits, no rate caps. If local hardware isn't an option, qwen3.5-flash through OpenRouter at $0.065/$0.26 per million tokens is the cheapest API path. That's about $0.26 to generate a full-length article.

Is the Coding Plan worth it?

At $50/month for 90K requests across 7+ models including qwen3.5-plus and qwen3-coder-plus, it's strong value for heavy users. Compare: Anthropic's Claude Pro is $20/month but with tighter usage caps and only one provider's models. The Coding Plan gives you access to multiple frontier models from different labs. If you're making more than ~3,000 requests per day, the Pro plan pays for itself quickly versus pay-per-token pricing.

How does Qwen pricing compare to DeepSeek?

They're in a similar price bracket. DeepSeek V3 is slightly cheaper on raw output ($0.28 vs $0.40), but Qwen offers 1M context versus 128K, a much wider model selection (coding, vision, audio, translation, embeddings), and generally more reliable uptime. Our full comparison covers performance differences beyond just cost.

Are there hidden costs or fees?

No. There's no long-context surcharge (beyond a modest tiered increase on a few models), no setup fees, no minimum spend, and no per-request overhead. You pay per token consumed — that's it. The free tier doesn't require a credit card either.

Which Qwen model gives the best cost-to-performance ratio?

For general text: qwen3.5-flash. It's $0.10/$0.40 with 1M context and 65K max output — the best value in the entire lineup. For coding: qwen3-coder-next at $0.07/$0.30 is absurdly cheap for a dedicated code model. For maximum capability on a budget: qwen3.5-plus at $0.40/$2.40 competes with models costing 5-10x more.

Prices last verified March 2026 from Alibaba Cloud International (Singapore region), OpenRouter, and each competitor's official pricing page. Pricing can change — always confirm current rates on the provider's site before making purchasing decisions.