Three LLMs converted the same 10 Japanese scene briefs into Anima (Qwen-DiT) prompts, generated as 60 fixed-seed images on an M1 Max with a merged 3-character LoRA. The Qwen-to-Qwen affinity hypothesis did not survive; a strict formatter brief with character-count locks is what actually moved the results, and two failure modes survive any prompt.
Fujitsu's PHOTON claims up to 475x over Transformers, but that's tokens/s/GiB (multi-query memory throughput), not faster single responses. What the 1.2B paper tables, the quality drop, and 9-query integration really show.
DeepSeek-V4-Pro-DSpark isn't a new base model. It's the same 1.6T V4-Pro checkpoint plus a DSpark speculative-decoding head (~893GB). What config.json and the DeepSpec repo reveal, and why there's no speed benchmark yet.
Sakana Fugu trains no base model: a learned conductor routes GPT-5.5/Claude/Gemini. How it compares to PLaMo (scratch, closed) and LLM-jp (fully open), how it differs from OpenRouter, and its biggest risk.
Tested Qwen3.7 Plus on ModelScope: native function calling and parallel tool calls work. I built a tool loop, skills, and error recovery with just the openai SDK, then had it ship a working Flask BBS.
Tested Qwen3.7 Max and Plus proofreading a Japanese novel: both barely fix, split on quote punctuation and names, and the one 'typo' was a character name.
After a US order pulled Claude Fable 5, which Chinese models drop into Claude Code? Kimi K2.7 Code, Qwen3.7 Max, DeepSeek V4 and GLM-5.1 — constraints, VRAM, benchmark caveats.
AFM 3 splits into 20B on-device sparse (NAND-to-DRAM weight loading) and Cloud Pro on Google Cloud NVIDIA GPU. Three Google contexts, Foundation Models API opening, and what's still unreleased.
Tested LFM2.5-1.2B-JP-202606 on M1 Max 64GB. llama.cpp Q4_K_M: 208 tok/s decode, JSON intact, model name hallucinated (LFM→FDM). Q8_0: 157 tok/s, no hallucination. Tool calls broken via GGUF.
35M linear projection replaces E4B's 150M 16-layer Vision Encoder. Bidirectional attention in the 48-layer LLM absorbs patch features. Comparison with Fuyu, EVE, EVEv2, and Mono-InternVL.
Hands-on with Tencent Hy-MT2 1.8B Q4_K_M (1.08GB) on M1 Max 64GB via llama-server. JSON, SRT, HTML, glossary, and minority-language prompts with full input-output pairs. The 1.25bit 440MB build does not load on stock llama.cpp 8990, and 30B-A3B (hy_v3) is not in the Mac route yet.