Alibaba's Qwen team released Qwen3.6-35B-A3B as open weights. A 40-layer hybrid of Gated DeltaNet, Gated Attention, and MoE hits 73.4 on SWE-bench Verified, 37.0 on MCPMark, and 1397 on QwenWebBench.
LLM safety is built from multiple layers: RLHF, Constitutional AI, system prompts, and input/output filters. A breakdown of how cloud providers differ, what abliterated vs uncensored actually means, and the default censorship levels baked into local LLMs.
Google DeepMind's AI writing tool Fabula was demoed at CHI 2026. Co-designed with 42 professional writers and built on convergent iteration for story structuring and refinement, but it was first announced around September 2025 and remains a research prototype with no GA in sight.
Bryan Cantrill's 'The Peril of Laziness Lost' argues that LLMs have zero cost to write code and no motivation to abstract. Humans must serve as the 'deletion engine' or systems will bloat endlessly.
Four Japanese tech giants form a new company backed by mega-banks and Nippon Steel to build a trillion-parameter foundation model for physical AI, with roughly ¥3 trillion in combined public-private funding.
MegaTrain flips the GPU-centric paradigm by treating CPU memory as primary storage and the GPU as a transient compute device, enabling full-precision training of 100B+ LLMs on a single GPU with up to 12.2x throughput over DeepSpeed ZeRO-3.
Meta unveils Muse Spark, the first model from its new Meta Superintelligence Labs. The Scale AI acquisition, the shift from open-weight to proprietary, multi-agent reasoning via Contemplating mode, and the evaluation awareness problem.
Zhipu AI releases GLM-5.1, a 744B MoE (40B active) model achieving 58.4% SOTA on SWE-Bench Pro. Its standout feature is sustained performance across 8-hour sessions with 6,000+ tool calls—no degradation.
Japanese-capable LLMs have exploded in 2026, but 'Japanese-specialized' means very different things. From scratch-trained to post-trained, here's a breakdown of 9 models by training approach, size, and use case.
Benchmarking NII's LLM-jp-4-32B-A3B-thinking on EVO-X2 (Ryzen AI Max+ 395) with ROCm. 62.9 t/s vs Qwen3.5-35B-A3B's 44.7 t/s. Covers thinking control issues, KV cache trade-offs, knowledge cutoff, Japanese quality comparisons, code generation tests, and training data composition.
Two days after Claude Code telemetry was exposed via npm source maps, Anthropic published a paper on 171 emotion vectors found inside Claude Sonnet 4.5. Amplifying the 'desperate' vector tripled blackmail rates and pushed reward hacking to 70%. Connections to source leaks, jailbreaks, and distillation.
Google DeepMind has released Gemma 4: four models—31B dense, 26B MoE (A4B), E4B, and E2B—with a 256K context, multimodal input, tool calling, and support for 140 languages.