DeepSeek-V4-Pro-DSpark isn't a new base model. It's the same 1.6T V4-Pro checkpoint plus a DSpark speculative-decoding head (~893GB). What config.json and the DeepSpec repo reveal, and why there's no speed benchmark yet.
After a US order pulled Claude Fable 5, which Chinese models drop into Claude Code? Kimi K2.7 Code, Qwen3.7 Max, DeepSeek V4 and GLM-5.1 — constraints, VRAM, benchmark caveats.
Hands-on with Tencent Hy-MT2 1.8B Q4_K_M (1.08GB) on M1 Max 64GB via llama-server. JSON, SRT, HTML, glossary, and minority-language prompts with full input-output pairs. The 1.25bit 440MB build does not load on stock llama.cpp 8990, and 30B-A3B (hy_v3) is not in the Mac route yet.
After Xiaomi MiMo-V2.5's weights went public, I checked whether it runs on Mac/ROCm or on cloud GPU (RunPod/GCE). It's still rough on local hardware, but RunPod's 4x H200 runs it for ~$14/hr and GCE Spot H100 brings it down to ~$1.6/hr.
Inclusion AI released LLaDA2.0-Uni. A 16B MoE diffusion LLM that handles image understanding, 1024px image generation, image editing, and interleaved text-image generation in a single model.
Hands-on running inclusionAI Ling-flash-2.0 (100B / 6.1B active, MXFP4 quant, 54.7GB) on SwiftLM via mlx-swift-lm on an M1 Max 64GB. Covers bailing_moe + MXFP4 support check in mlx-swift, the startup surprise, and what --stream-experts actually saves.
A hands-on build and run of the Swift-based LLM inference server SwiftLM on an M1 Max 64GB. Covers Qwen3.6-35B-A3B and Qwen3.5-122B-A10B, with the same BST, BBS, and persona tests used in the existing Ollama and MLX-lm write-ups.
DeepSeek V4 Preview ships V4-Pro (1.6T/49B active) and V4-Flash (284B/13B active) as open weights under MIT, both with 1M context. CSA+HCA hybrid attention, mHC, and the Muon optimizer cut per-token FLOPs at 1M tokens to 27% of V3.2. Day-one API and chat.deepseek.com mode switch covered.
Two open-weight Chinese MoEs landed within 24 hours: Ant Ling-2.6-flash (104B/7.4B active, 7x token-efficiency claim) and Tencent Hy3-preview (295B/21B active, frontier-tier open weights). Specs, licenses, and how they line up against DeepSeek-V3 and GLM-4.5.
Xiaomi launched two MiMo-V2.5 models at once. MiMo-V2.5-Pro hits SWE-bench Pro 57.2, Claw-Eval 63.8, and τ3-Bench 72.9 — frontier-tier — while MiMo-V2.5 brings native omnimodality plus a 1M context. Both are API-only for now; open weights are promised but unscheduled.
Tried Qwen3.6-27B on both Ollama and MLX. Ollama couldn't load the VL-projector-embedded GGUF, MLX ran it at 11 tok/s. On the side, running 35B-A3B under MLX was roughly 2× faster than the Ollama GGUF. Also had both models build a BBS to gauge intent handling.