Qwen3.6-Max-Preview and Kimi K2.6 landed nearly back-to-back — lining up both flagship coding models

Two Chinese 1T-class MoE flagships shipped back-to-back inside a single 24-hour window spanning April 20–21, 2026.
Alibaba’s Qwen3.6-Max-Preview, and Moonshot AI’s Kimi K2.6.
Both are aimed squarely at coding agents, and the benchmarks overlap enough to make a side-by-side comparison straightforward. So I’m putting them in one post.

Note that neither is realistic to run locally — Max-Preview is API-only, and Kimi K2.6 is also a 1T-class MoE.
If you’d rather see an open-weight model actually running, the same Qwen3.6 family’s 35B-A3B has a hands-on log on M1 Max: I Ran Qwen3.6-35B-A3B on M1 Max via Ollama and Thinking Tokens Ballooned 13×.
On the open-weight side, Qwen3.6-27B dense also dropped on April 22. I compared it against 35B-A3B and measured the MLX-vs-Ollama runtime gap on the same model in Qwen3.6-27B Dense vs Qwen3.6-35B-A3B MoE on M1 Max — MLX Was 2× Faster Than Ollama.

Release timing

Both models were announced on X (formerly Twitter) at roughly the same time, and the announcements coincided with the actual public availability.
This is not the “weights quietly landed on Hugging Face first and got discovered later” pattern — the announcement itself was the official release.

Model	Announced on X (JST)	Availability
Qwen3.6-Max-Preview	2026-04-20 evening	Immediately usable via Alibaba Cloud DashScope and the Qwen Studio web UI
Kimi K2.6	2026-04-21 early morning	Weights on Hugging Face, API on kimi.com and platform.moonshot.ai at the same time

On the Qwen side, Qwen3.6-35B-A3B had already been released as open weights on April 17, and Max-Preview was slotted in afterwards as an API-only closed tier on top of that.
On the Kimi side, this is the first major update in about three months since Kimi K2.5 in late January. The architecture inherits the K2.5 family, and the new stage is mainly scaled RL and a bigger Agent Swarm.
The result is a clean contrast in release strategy: “Qwen ships the smaller open tier first, flagship API-only” versus “Kimi keeps shipping the flagship itself as open weights.”

Where each model sits in its series

flowchart LR
  Q[Qwen3.6 series] --> Q1[35B-A3B<br/>Open weights<br/>Runs locally]
  Q --> Q2[Plus<br/>Cloud-only<br/>1M context]
  Q --> Q3[Max-Preview<br/>Early flagship<br/>API only]
  K[Kimi series] --> K1[K2.5<br/>1T MoE<br/>Native multimodal]
  K1 --> K2[K2.6<br/>Coding / agent<br/>upgrade]

Qwen runs a two-pronged stack: Max-Preview closed, 35B-A3B open. Plus sits separately as a 1M-context specialist.
Kimi ships K2.6 itself as open weights, with no closed-only sibling in the same lineage. The flagship is distributed directly under Modified MIT. That’s the structural difference.

Basic specs

Lining the two up at the spec-sheet level. Qwen leaves a lot of internals undisclosed, which is where the information gap shows up.

Item	Qwen3.6-Max-Preview	Kimi K2.6
Total parameters	>1T (exact number not disclosed)	1T
Active per token	Not disclosed	32B
Architecture	Sparse MoE (internals not disclosed)	MLA + SwiGLU, 61 layers, 384 + 1 shared expert, Top-K 8
Context length	262,144 (256K)	262,144 (256K)
Max output tokens	8,192	Designed for long multi-step runs; the practical cap sits on the deployment side
Modality	Text only	Text-centered (inherits the K2.5 multimodal backbone)
Quantization	Not disclosed	Native INT4 (QAT)
License	Closed-source, API only	Modified MIT (weights distributed)
Vocabulary size	Not disclosed	160K

Context length happens to match exactly at 256K on both sides.
Qwen3.6-Max-Preview is shorter than Plus (1M), looking like a trade-off tuned for inference memory budget and cost.
Kimi K2.6 assumes native INT4 QAT, so even at 1T MoE it is tilted toward actually landing the flagship on local hardware.

Benchmarks

Covering the benchmarks that show up for both. Numbers come from each vendor’s official release notes and Artificial Analysis; the evaluation conditions (prompts, tool setup, number of runs) are not aligned, so read the table with that in mind.

Benchmark	Qwen3.6-Max-Preview	Kimi K2.6	Ref: Kimi K2.5	Ref: Qwen3.6-35B-A3B
SWE-Bench Pro	57.30	58.6	50.7	49.5
Terminal-Bench 2.0	65.40	66.7	50.8	51.5
GDPval-AA	51.0	—	—	—
AA Intelligence Index	52.0	—	—	—
HLE Full (with tools)	—	54.0	50.2	—
BrowseComp	—	83.2	74.9	—
Toolathlon	—	50.0	27.8	—
SWE-Bench Multilingual	—	76.7	73.0	—
MathVision (with python)	—	93.2	85.0	—

On the two shared axes — SWE-Bench Pro and Terminal-Bench 2.0 — Kimi K2.6 sits slightly above Qwen3.6-Max-Preview.
Both are up by roughly 10 points over the K2.5 generation, so “RL tuned for agentic coding” is clearly firing on both sides at the same time.
Qwen is strongest on the AA Intelligence Index (2nd out of 201 models) for general-purpose ability, while Kimi pulls ahead on tool-driven, Web-autonomy style benchmarks like BrowseComp and Toolathlon. The contrast is general-intelligence headline number vs. agent-specialized scores.

One caveat on Qwen3.6-Max-Preview in Artificial Analysis’s measurement: it produced around 74M tokens across the evaluation tasks, versus an average of 26M for other models. That’s close to 3x the verbosity.
For coding agents that write out steps one by one, verbose output can be an advantage. But for general chat use, you need max_tokens or a system prompt tightening things down, otherwise cost balloons fast.

Agent-side features

Both are aimed at coding agents, but the angle of attack is different.

Qwen3.6-Max-Preview side

Explicitly calls out agentic coding ability beyond Qwen3.6-Plus
Pitches improvements in real-world agents and knowledge reliability
Implementation-wise, it rides straight on DashScope’s OpenAI-compatible endpoint
Wiring through Qwen-Agent and bundling MCP servers follows the same pattern as 35B-A3B

Kimi K2.6 side

Agent Swarm scaling pushed from 100 × 1,500 steps to 300 × 4,000 steps
Built around 300 parallel sub-agents cooperating under a single orchestrator
Official examples of long-running coding missions (12 hours, 4,000 tool calls straight): Mac optimization of Qwen3.5-0.8B, exchange-core optimization
Strengthened animated frontend generation — WebGL, GSAP, Framer Motion, Three.js
External runtimes called out by name — OpenClaw and Hermes Agent — with “proactive agents” baked in as an assumption
Claw Groups, a research preview for multi-ownership mode where your agents and someone else’s share the same loop

Qwen’s priority is “let you call a genuinely strong flagship via API,” and the runtime side is left to the user.
Kimi bundles model + runtime + multi-agent-first UX into one push.
That’s a continuation of Moonshot AI’s work on the harness side — the AttnRes research in Kimi Linear, and Cursor picking K2.5 for Composer 2 — which has steadily built up the supporting infrastructure.

Pricing and distribution

Cost profile looks pretty different from the user’s side.

Category	Qwen3.6-Max-Preview	Kimi K2.6
Distribution	API only (DashScope), Web UI available	Weights distributed + API (platform.moonshot.ai, OpenAI/Anthropic compatible)
Input pricing	$6 / 1M tokens (DataLearner aggregation)	See official pricing page (inherits the K2.5 tier)
Output pricing	$24 / 1M tokens (same source)	Same as above
Preview free tier	Artificial Analysis logged it at $0.00, so there may be a free window during preview	No API cost at all if you run weights locally
Local inference	Not possible	Possible via vLLM, SGLang, KTransformers, etc. (transformers ≥ 4.57.1 and < 5.x)

Qwen3.6-Max-Preview’s selling point is “call a flagship-class model over an OpenAI-compatible API as a drop-in replacement.” Switching cost is low, but usage-based billing plus the verbosity tendency can run the bill up quickly.
Kimi K2.6 lets you put the weights on your own GPUs, so with existing hardware you can run the flagship at zero API cost. The trade-off is the non-trivial cost of standing up infrastructure that actually runs a 1T MoE.

Things to know before you touch either

Both models share some early-release caveats worth flagging.

Qwen3.6-Max-Preview is text-only. For multimodal input, use Qwen3-Omni.
The output cap is 8K, so long-form generation needs a streaming + continuation-prompt split design.
Model name, pricing, and context length may change by GA.
For serious Kimi K2.6 runs, 300-parallel × 4,000-step Agent Swarm is the scale at which permissioning, billing, and security design tends to break. If you pair it with external runtimes like OpenClaw, the same issues called out in the post on subscription tiers locking out third-party harnesses come back on the billing and audit side directly.
MIT-based weight distribution is easy to handle license-wise, but OpenAI/Anthropic-compatible API contract clauses can still differ once you go to production.

Qwen’s shift from “all-in on open” to a two-track stack

The Qwen3 generation had been running a lineage where agent-focused flagship-adjacent models were all distributed as Apache 2.0 open weights — Qwen3-Coder-Next, Qwen3-Omni, Qwen3.6-35B-A3B, all of them.
”Open top to bottom” was for a while the core of what the Qwen brand meant.

Qwen3.6-Max-Preview takes the top tier of that lineage and moves it into an API-only closed slot.
The open-weight side is kept by 35B-A3B, while the flagship goes cloud-only — a two-track layout closer to what Mistral and Cohere ended up with, where the head is closed and the long tail stays open.
Internal architecture (total parameters, active count, MoE layout, quantization) being wholly undisclosed for Max-Preview fits cleanly with that pivot.

Tabling the shift:

Series	Top-tier treatment	Mid/lower-tier treatment
Qwen3.6	Max-Preview is closed (DashScope API only, architecture not disclosed)	35B-A3B is Apache 2.0 open, Plus is a 1M cloud specialist
Kimi K2 family	K2.6 itself is Modified MIT open-weight, architecture disclosed	No smaller-tier model at all — a single flagship doing both jobs

Kimi’s answer to all this is to swing the other way, which is the core point of this release.
They drop the 1T MoE flagship onto Hugging Face directly, and even bake in native INT4 QAT so “run it yourself” remains on the table.
On the runtime side, Kimi CLI, Agent Swarm, and the OpenClaw / Hermes Agent integration are presented together. The pitch is a full-stack model + harness + operations package, leaning open on every layer.

Two Chinese 1T-class MoEs, flipped in opposite directions on the public-openness layer. That’s the quietly significant framing here.
If Qwen keeps migrating the “open-weight” label toward cost-focused mid-tier models, the center of gravity for agent implementation (especially local / on-prem expectations) is likely to shift toward Kimi.
Conversely, if Max-Preview’s pricing and rate limits stabilize in production, “call Qwen from the cloud, run Kimi long-running on-prem” might settle in as the split.

What might be going on inside the Qwen team

The decision to lock Max-Preview to a closed API can’t really be explained by the model alone.
Picking signal out of Chinese-language explanatory posts and social media over the range since early 2026, a few things have been observed around the Qwen team.

Some core Qwen developers appear to be moving to different projects, independent work, or other companies, based on recurring reports
Multiple social media posts and summaries describe Alibaba Cloud shifting strategy — from an all-directions open research posture to clearly putting its generative AI business into a revenue phase
There are also recurring reports of internal friction between the open-first side and the revenue-first side, though no strong primary sources or official announcement confirm this

This is rumor-layer information, so treat it with a low confidence flag.
On the other hand, the decisive move of “yank the top-tier flagship into a closed API only” and “redact internal architecture and quantization information entirely” lines up well if you assume the internal dynamics and strategic pivot above.
Part of why the contrast with Kimi sticking to open weights is showing up this cleanly may be that the Chinese LLM industry as a whole is starting to get serious about where to actually make money.
You’d want primary sources to verify, but it’s worth not reading Max-Preview’s spec sheet in isolation — the Qwen brand’s publicness strategy may be in the middle of an inflection.

The SWE-Bench Pro and Terminal-Bench 2.0 ranges are close enough that, on numbers alone, you can pick either based on preference.
From here on the contest moves to the operations layer — which harness you put it on, whose permissions and billing, and how long you keep it running.