#MoE

16 articles

Tech Apr 30, 2026 9 min

Can Xiaomi MiMo-V2.5 actually run on a Mac or ROCm?

After Xiaomi MiMo-V2.5's weights went public, I checked whether it runs on Mac/ROCm or on cloud GPU (RunPod/GCE). It's still rough on local hardware, but RunPod's 4x H200 runs it for ~$14/hr and GCE Spot H100 brings it down to ~$1.6/hr.

AI LLM Local LLM Xiaomi MoE Apple Silicon ROCm llama.cpp

Tech Apr 27, 2026 7 min

LLaDA2.0-Uni Is an Open-Weight Diffusion LLM That Unifies Image Understanding and Generation

Inclusion AI released LLaDA2.0-Uni. A 16B MoE diffusion LLM that handles image understanding, 1024px image generation, image editing, and interleaved text-image generation in a single model.

AI LLM Image Generation VLM MoE Open Model Multimodal

Tech Apr 25, 2026 updated 11 min

Ling-flash-2.0 MXFP4 (bailing_moe) on SwiftLM + M1 Max 64GB: working config, support check, --stream-experts notes

Hands-on running inclusionAI Ling-flash-2.0 (100B / 6.1B active, MXFP4 quant, 54.7GB) on SwiftLM via mlx-swift-lm on an M1 Max 64GB. Covers bailing_moe + MXFP4 support check in mlx-swift, the startup surprise, and what --stream-experts actually saves.

Apple Silicon LLM MLX Local LLM Swift SwiftLM MoE MXFP4 Ant Group Experiment

Tech Apr 24, 2026 13 min

Running SwiftLM on M1 Max 64GB and Comparing It to Ollama and MLX-lm

A hands-on build and run of the Swift-based LLM inference server SwiftLM on an M1 Max 64GB. Covers Qwen3.6-35B-A3B and Qwen3.5-122B-A10B, with the same BST, BBS, and persona tests used in the existing Ollama and MLX-lm write-ups.

Apple Silicon LLM MLX Local LLM Swift SwiftLM MoE Experiment

Tech Apr 24, 2026 updated 11 min

DeepSeek V4 Preview specs: V4-Pro 1.6T and V4-Flash 284B open under MIT, 1M context, 27% inference FLOPs of V3.2

DeepSeek V4 Preview ships V4-Pro (1.6T/49B active) and V4-Flash (284B/13B active) as open weights under MIT, both with 1M context. CSA+HCA hybrid attention, mHC, and the Muon optimizer cut per-token FLOPs at 1M tokens to 27% of V3.2. Day-one API and chat.deepseek.com mode switch covered.

LLM DeepSeek Chinese AI MoE Open Model AI Agent

Tech Apr 24, 2026 updated 14 min

Tencent Hy3-preview (295B) vs Ant Ling-2.6-flash (104B): two open Chinese MoEs released the same week

Two open-weight Chinese MoEs landed within 24 hours: Ant Ling-2.6-flash (104B/7.4B active, 7x token-efficiency claim) and Tencent Hy3-preview (295B/21B active, frontier-tier open weights). Specs, licenses, and how they line up against DeepSeek-V3 and GLM-4.5.

LLM Chinese AI MoE Open Model AI Agent Local LLM OpenRouter

Tech Apr 23, 2026 updated 9 min

Xiaomi ships MiMo-V2.5 and MiMo-V2.5-Pro together — 1M omnimodal and 1,000-tool agent, API only

Xiaomi launched two MiMo-V2.5 models at once. MiMo-V2.5-Pro hits SWE-bench Pro 57.2, Claw-Eval 63.8, and τ3-Bench 72.9 — frontier-tier — while MiMo-V2.5 brings native omnimodality plus a 1M context. Both are API-only for now; open weights are promised but unscheduled.

AI LLM Chinese AI MoE AI Agents Multimodal Xiaomi

Tech Apr 23, 2026 13 min

Qwen3.6-27B Dense vs Qwen3.6-35B-A3B MoE on M1 Max — MLX Was 2× Faster Than Ollama

Tried Qwen3.6-27B on both Ollama and MLX. Ollama couldn't load the VL-projector-embedded GGUF, MLX ran it at 11 tok/s. On the side, running 35B-A3B under MLX was roughly 2× faster than the Ollama GGUF. Also had both models build a BBS to gauge intent handling.

LLM Local LLM Qwen Ollama MLX Apple Silicon MoE Experiment

Tech Apr 21, 2026 updated 11 min

I Ran Qwen3.6-35B-A3B on M1 Max via Ollama and Thinking Tokens Ballooned 13×

A hands-on log of Qwen3.6-35B-A3B under Ollama 0.20.6. Generation speed matches Qwen3.5 at 27 tok/s, but thinking tokens grew 13× for the same prompt. Multi-turn, persona, and a three-tier NSFW probe are included.

LLM Local LLM Qwen Ollama Apple Silicon MoE Experiment

Tech Apr 21, 2026 updated 11 min

Qwen3.6-Max-Preview and Kimi K2.6 landed nearly back-to-back — lining up both flagship coding models

Alibaba's Qwen3.6-Max-Preview and Moonshot AI's Kimi K2.6 were released within a 24-hour window on April 20–21, 2026. A side-by-side look at specs, benchmarks, distribution, and agent-side features for the two flagships.

LLM Qwen Kimi Moonshot AI MoE Agent Coding

Tech Apr 17, 2026 updated 10 min

Qwen3.6-35B-A3B pairs Gated DeltaNet with MoE and raises the bar on agentic coding

Alibaba's Qwen team released Qwen3.6-35B-A3B as open weights. A 40-layer hybrid of Gated DeltaNet, Gated Attention, and MoE hits 73.4 on SWE-bench Verified, 37.0 on MCPMark, and 1397 on QwenWebBench.

LLM Local LLM Qwen MoE Agent Coding

Tech Apr 8, 2026 updated 8 min

GLM-5.1 (Zhipu, 744B / 40B MoE, MIT): 58.4% SOTA on SWE-Bench Pro, 8h / 6,000+ tool calls without degradation

Zhipu AI's GLM-5.1 is a 744B MoE (40B active, 200K context, MIT) targeting long-horizon agent tasks. Hits 58.4% SOTA on SWE-Bench Pro (edging out GPT-5.4 and Claude Opus 4.6) and sustains performance across 8-hour sessions with 6,000+ tool calls without degradation.

AI LLM Chinese AI MoE Open Model AI Agent