Bryan Cantrill's 'The Peril of Laziness Lost' argues that LLMs have zero cost to write code and no motivation to abstract. Humans must serve as the 'deletion engine' or systems will bloat endlessly.
Four Japanese tech giants form a new company backed by mega-banks and Nippon Steel to build a trillion-parameter foundation model for physical AI, with roughly ¥3 trillion in combined public-private funding.
MegaTrain flips the GPU-centric paradigm by treating CPU memory as primary storage and the GPU as a transient compute device, enabling full-precision training of 100B+ LLMs on a single GPU with up to 12.2x throughput over DeepSpeed ZeRO-3.
Meta unveils Muse Spark, the first model from its new Meta Superintelligence Labs. The Scale AI acquisition, the shift from open-weight to proprietary, multi-agent reasoning via Contemplating mode, and the evaluation awareness problem.
Zhipu AI's GLM-5.1 is a 744B MoE (40B active, 200K context, MIT) targeting long-horizon agent tasks. Hits 58.4% SOTA on SWE-Bench Pro (edging out GPT-5.4 and Claude Opus 4.6) and sustains performance across 8-hour sessions with 6,000+ tool calls without degradation.
9 Japanese-specialized LLMs as of April 2026 — LLM-jp-4 (11.7T tokens from scratch), PLaMo, Nemotron Nano 9B JP (#1 sub-10B on Nejumi 4), Swallow 30B-A3B, Namazu — broken down by whether they were scratch-trained, continued pre-trained, or post-trained, with size, license, benchmark scores.
Benchmarking NII's LLM-jp-4-32B-A3B-thinking on EVO-X2 (Ryzen AI Max+ 395) with ROCm. 62.9 t/s vs Qwen3.5-35B-A3B's 44.7 t/s. Covers thinking control issues, KV cache trade-offs, knowledge cutoff, Japanese quality comparisons, code generation tests, and training data composition.
Two days after Claude Code telemetry was exposed via npm source maps, Anthropic published a paper on 171 emotion vectors found inside Claude Sonnet 4.5. Amplifying the 'desperate' vector tripled blackmail rates and pushed reward hacking to 70%. Connections to source leaks, jailbreaks, and distillation.
Google DeepMind has released Gemma 4: four models—31B dense, 26B MoE (A4B), E4B, and E2B—with a 256K context, multimodal input, tool calling, and support for 140 languages.
SwiftLM, an Apple Silicon–only MLX inference server, provides a native Metal implementation of TurboQuant V2+V3 hybrid KV‑cache compression and NVMe SSD expert streaming.
Hugging Face's LLM post-training library TRL has reached v1.0. Stable/Experimental tiers, the stabilization of GRPO/DPO/SFT, and a roadmap that includes asynchronous GRPO all point to a more mature stack.
Cloudflare added a two-stage GNN+LLM cascade to its client-side malicious script detection, reducing false positives per unique script from 1.39% to 0.007% and opening the formerly paid Advanced features to self-serve customers.