Qwen3.6-Max-Preview and Kimi K2.6 landed nearly back-to-back — lining up both flagship coding models
Contents
Two Chinese 1T-class MoE flagships shipped back-to-back inside a single 24-hour window spanning April 20–21, 2026.
Alibaba’s Qwen3.6-Max-Preview, and Moonshot AI’s Kimi K2.6.
Both are aimed squarely at coding agents, and the benchmarks overlap enough to make a side-by-side comparison straightforward. So I’m putting them in one post.
Release timing
Both models were announced on X (formerly Twitter) at roughly the same time, and the announcements coincided with the actual public availability.
This is not the “weights quietly landed on Hugging Face first and got discovered later” pattern — the announcement itself was the official release.
| Model | Announced on X (JST) | Availability |
|---|---|---|
| Qwen3.6-Max-Preview | 2026-04-20 evening | Immediately usable via Alibaba Cloud DashScope and the Qwen Studio web UI |
| Kimi K2.6 | 2026-04-21 early morning | Weights on Hugging Face, API on kimi.com and platform.moonshot.ai at the same time |
On the Qwen side, Qwen3.6-35B-A3B had already been released as open weights on April 17, and Max-Preview was slotted in afterwards as an API-only closed tier on top of that.
On the Kimi side, this is the first major update in about three months since Kimi K2.5 in late January. The architecture inherits the K2.5 family, and the new stage is mainly scaled RL and a bigger Agent Swarm.
The result is a clean contrast in release strategy: “Qwen ships the smaller open tier first, flagship API-only” versus “Kimi keeps shipping the flagship itself as open weights.”
Where each model sits in its series
flowchart LR
Q[Qwen3.6 series] --> Q1[35B-A3B<br/>Open weights<br/>Runs locally]
Q --> Q2[Plus<br/>Cloud-only<br/>1M context]
Q --> Q3[Max-Preview<br/>Early flagship<br/>API only]
K[Kimi series] --> K1[K2.5<br/>1T MoE<br/>Native multimodal]
K1 --> K2[K2.6<br/>Coding / agent<br/>upgrade]
Qwen runs a two-pronged stack: Max-Preview closed, 35B-A3B open. Plus sits separately as a 1M-context specialist.
Kimi ships K2.6 itself as open weights, with no closed-only sibling in the same lineage. The flagship is distributed directly under Modified MIT. That’s the structural difference.
Basic specs
Lining the two up at the spec-sheet level. Qwen leaves a lot of internals undisclosed, which is where the information gap shows up.
| Item | Qwen3.6-Max-Preview | Kimi K2.6 |
|---|---|---|
| Total parameters | >1T (exact number not disclosed) | 1T |
| Active per token | Not disclosed | 32B |
| Architecture | Sparse MoE (internals not disclosed) | MLA + SwiGLU, 61 layers, 384 + 1 shared expert, Top-K 8 |
| Context length | 262,144 (256K) | 262,144 (256K) |
| Max output tokens | 8,192 | Designed for long multi-step runs; the practical cap sits on the deployment side |
| Modality | Text only | Text-centered (inherits the K2.5 multimodal backbone) |
| Quantization | Not disclosed | Native INT4 (QAT) |
| License | Closed-source, API only | Modified MIT (weights distributed) |
| Vocabulary size | Not disclosed | 160K |
Context length happens to match exactly at 256K on both sides.
Qwen3.6-Max-Preview is shorter than Plus (1M), looking like a trade-off tuned for inference memory budget and cost.
Kimi K2.6 assumes native INT4 QAT, so even at 1T MoE it is tilted toward actually landing the flagship on local hardware.
Benchmarks
Covering the benchmarks that show up for both. Numbers come from each vendor’s official release notes and Artificial Analysis; the evaluation conditions (prompts, tool setup, number of runs) are not aligned, so read the table with that in mind.
| Benchmark | Qwen3.6-Max-Preview | Kimi K2.6 | Ref: Kimi K2.5 | Ref: Qwen3.6-35B-A3B |
|---|---|---|---|---|
| SWE-Bench Pro | 57.30 | 58.6 | 50.7 | 49.5 |
| Terminal-Bench 2.0 | 65.40 | 66.7 | 50.8 | 51.5 |
| GDPval-AA | 51.0 | — | — | — |
| AA Intelligence Index | 52.0 | — | — | — |
| HLE Full (with tools) | — | 54.0 | 50.2 | — |
| BrowseComp | — | 83.2 | 74.9 | — |
| Toolathlon | — | 50.0 | 27.8 | — |
| SWE-Bench Multilingual | — | 76.7 | 73.0 | — |
| MathVision (with python) | — | 93.2 | 85.0 | — |
On the two shared axes — SWE-Bench Pro and Terminal-Bench 2.0 — Kimi K2.6 sits slightly above Qwen3.6-Max-Preview.
Both are up by roughly 10 points over the K2.5 generation, so “RL tuned for agentic coding” is clearly firing on both sides at the same time.
Qwen is strongest on the AA Intelligence Index (2nd out of 201 models) for general-purpose ability, while Kimi pulls ahead on tool-driven, Web-autonomy style benchmarks like BrowseComp and Toolathlon. The contrast is general-intelligence headline number vs. agent-specialized scores.
One caveat on Qwen3.6-Max-Preview in Artificial Analysis’s measurement: it produced around 74M tokens across the evaluation tasks, versus an average of 26M for other models. That’s close to 3x the verbosity.
For coding agents that write out steps one by one, verbose output can be an advantage. But for general chat use, you need max_tokens or a system prompt tightening things down, otherwise cost balloons fast.
Agent-side features
Both are aimed at coding agents, but the angle of attack is different.
Qwen3.6-Max-Preview side
- Explicitly calls out agentic coding ability beyond Qwen3.6-Plus
- Pitches improvements in real-world agents and knowledge reliability
- Implementation-wise, it rides straight on DashScope’s OpenAI-compatible endpoint (I wrote up a minimal setup in DashScope API integration notes)
- Wiring through Qwen-Agent and bundling MCP servers follows the same pattern as 35B-A3B
Kimi K2.6 side
- Agent Swarm scaling pushed from 100 × 1,500 steps to 300 × 4,000 steps
- Built around 300 parallel sub-agents cooperating under a single orchestrator
- Official examples of long-running coding missions (12 hours, 4,000 tool calls straight): Mac optimization of Qwen3.5-0.8B, exchange-core optimization
- Strengthened animated frontend generation — WebGL, GSAP, Framer Motion, Three.js
- External runtimes called out by name — OpenClaw and Hermes Agent — with “proactive agents” baked in as an assumption
- Claw Groups, a research preview for multi-ownership mode where your agents and someone else’s share the same loop
Qwen’s priority is “let you call a genuinely strong flagship via API,” and the runtime side is left to the user.
Kimi bundles model + runtime + multi-agent-first UX into one push.
That’s a continuation of Moonshot AI’s work on the harness side — the AttnRes research in Kimi Linear, and Cursor picking K2.5 for Composer 2 — which has steadily built up the supporting infrastructure.
Pricing and distribution
Cost profile looks pretty different from the user’s side.
| Category | Qwen3.6-Max-Preview | Kimi K2.6 |
|---|---|---|
| Distribution | API only (DashScope), Web UI available | Weights distributed + API (platform.moonshot.ai, OpenAI/Anthropic compatible) |
| Input pricing | $6 / 1M tokens (DataLearner aggregation) | See official pricing page (inherits the K2.5 tier) |
| Output pricing | $24 / 1M tokens (same source) | Same as above |
| Preview free tier | Artificial Analysis logged it at $0.00, so there may be a free window during preview | No API cost at all if you run weights locally |
| Local inference | Not possible | Possible via vLLM, SGLang, KTransformers, etc. (transformers ≥ 4.57.1 and < 5.x) |
Qwen3.6-Max-Preview’s selling point is “call a flagship-class model over an OpenAI-compatible API as a drop-in replacement.” Switching cost is low, but usage-based billing plus the verbosity tendency can run the bill up quickly.
Kimi K2.6 lets you put the weights on your own GPUs, so with existing hardware you can run the flagship at zero API cost. The trade-off is the non-trivial cost of standing up infrastructure that actually runs a 1T MoE.
Things to know before you touch either
Both models share some early-release caveats worth flagging.
- Qwen3.6-Max-Preview is text-only. For multimodal input, use Qwen3-Omni.
- The output cap is 8K, so long-form generation needs a streaming + continuation-prompt split design.
- Model name, pricing, and context length may change by GA.
- For serious Kimi K2.6 runs, 300-parallel × 4,000-step Agent Swarm is the scale at which permissioning, billing, and security design tends to break. If you pair it with external runtimes like OpenClaw, the same issues called out in the post on subscription tiers locking out third-party harnesses come back on the billing and audit side directly.
- MIT-based weight distribution is easy to handle license-wise, but OpenAI/Anthropic-compatible API contract clauses can still differ once you go to production.
Qwen’s shift from “all-in on open” to a two-track stack
The Qwen3 generation had been running a lineage where agent-focused flagship-adjacent models were all distributed as Apache 2.0 open weights — Qwen3-Coder-Next, Qwen3-Omni, Qwen3.6-35B-A3B, all of them.
”Open top to bottom” was for a while the core of what the Qwen brand meant.
Qwen3.6-Max-Preview takes the top tier of that lineage and moves it into an API-only closed slot.
The open-weight side is kept by 35B-A3B, while the flagship goes cloud-only — a two-track layout closer to what Mistral and Cohere ended up with, where the head is closed and the long tail stays open.
Internal architecture (total parameters, active count, MoE layout, quantization) being wholly undisclosed for Max-Preview fits cleanly with that pivot.
Tabling the shift:
| Series | Top-tier treatment | Mid/lower-tier treatment |
|---|---|---|
| Qwen3.6 | Max-Preview is closed (DashScope API only, architecture not disclosed) | 35B-A3B is Apache 2.0 open, Plus is a 1M cloud specialist |
| Kimi K2 family | K2.6 itself is Modified MIT open-weight, architecture disclosed | No smaller-tier model at all — a single flagship doing both jobs |
Kimi’s answer to all this is to swing the other way, which is the core point of this release.
They drop the 1T MoE flagship onto Hugging Face directly, and even bake in native INT4 QAT so “run it yourself” remains on the table.
On the runtime side, Kimi CLI, Agent Swarm, and the OpenClaw / Hermes Agent integration are presented together. The pitch is a full-stack model + harness + operations package, leaning open on every layer.
Two Chinese 1T-class MoEs, flipped in opposite directions on the public-openness layer. That’s the quietly significant framing here.
If Qwen keeps migrating the “open-weight” label toward cost-focused mid-tier models, the center of gravity for agent implementation (especially local / on-prem expectations) is likely to shift toward Kimi.
Conversely, if Max-Preview’s pricing and rate limits stabilize in production, “call Qwen from the cloud, run Kimi long-running on-prem” might settle in as the split.
What might be going on inside the Qwen team
The decision to lock Max-Preview to a closed API can’t really be explained by the model alone.
Picking signal out of Chinese-language explanatory posts and social media over the range since early 2026, a few things have been observed around the Qwen team.
- Some core Qwen developers appear to be moving to different projects, independent work, or other companies, based on recurring reports
- Multiple social media posts and summaries describe Alibaba Cloud shifting strategy — from an all-directions open research posture to clearly putting its generative AI business into a revenue phase
- There are also recurring reports of internal friction between the open-first side and the revenue-first side, though no strong primary sources or official announcement confirm this
This is rumor-layer information, so treat it with a low confidence flag.
On the other hand, the decisive move of “yank the top-tier flagship into a closed API only” and “redact internal architecture and quantization information entirely” lines up well if you assume the internal dynamics and strategic pivot above.
Part of why the contrast with Kimi sticking to open weights is showing up this cleanly may be that the Chinese LLM industry as a whole is starting to get serious about where to actually make money.
You’d want primary sources to verify, but it’s worth not reading Max-Preview’s spec sheet in isolation — the Qwen brand’s publicness strategy may be in the middle of an inflection.
The SWE-Bench Pro and Terminal-Bench 2.0 ranges are close enough that, on numbers alone, you can pick either based on preference.
From here on the contest moves to the operations layer — which harness you put it on, whose permissions and billing, and how long you keep it running.