UC Berkeley's RDI team demonstrated that major benchmarks including SWE-bench and WebArena can be manipulated to near-perfect scores without completing any tasks. They identified 7 vulnerability patterns and released BenchJack, an automated benchmark attack tool.
Four Japanese tech giants form a new company backed by mega-banks and Nippon Steel to build a trillion-parameter foundation model for physical AI, with roughly ¥3 trillion in combined public-private funding.
Based on the 2025 Maintainers Summit consensus, coding-assistants.rst was merged into the Linux kernel, establishing rules for AI-assisted contributions: no Signed-off-by for AI, Assisted-by tag attribution, and full human responsibility.
A research project reverse-engineered Google DeepMind's SynthID image watermark using FFT-based spectral analysis. The V3 bypass achieves 91% phase removal while maintaining SSIM 0.997. Is removing an invisible watermark copyright infringement? Analysis from DMCA, EU AI Act, and Japanese law perspectives.
Sentence Transformers v5.4 adds multimodal support. Eight embedding models and four rerankers including Qwen3-VL and NVIDIA Nemotron can now be used through a unified API.
Meta unveils Muse Spark, the first model from its new Meta Superintelligence Labs. The Scale AI acquisition, the shift from open-weight to proprietary, multi-agent reasoning via Contemplating mode, and the evaluation awareness problem.
Zhipu AI's GLM-5.1 is a 744B MoE (40B active, 200K context, MIT) targeting long-horizon agent tasks. Hits 58.4% SOTA on SWE-Bench Pro (edging out GPT-5.4 and Claude Opus 4.6) and sustains performance across 8-hour sessions with 6,000+ tool calls without degradation.
9 Japanese-specialized LLMs as of April 2026 — LLM-jp-4 (11.7T tokens from scratch), PLaMo, Nemotron Nano 9B JP (#1 sub-10B on Nejumi 4), Swallow 30B-A3B, Namazu — broken down by whether they were scratch-trained, continued pre-trained, or post-trained, with size, license, benchmark scores.
WordPress staple plugin ACF 6.8 adds Abilities API integration, automatic Schema.org structured data, and WP-CLI commands. How AI agents can now discover and manipulate WordPress content models.
Benchmarking NII's LLM-jp-4-32B-A3B-thinking on EVO-X2 (Ryzen AI Max+ 395) with ROCm. 62.9 t/s vs Qwen3.5-35B-A3B's 44.7 t/s. Covers thinking control issues, KV cache trade-offs, knowledge cutoff, Japanese quality comparisons, code generation tests, and training data composition.
Two days after Claude Code telemetry was exposed via npm source maps, Anthropic published a paper on 171 emotion vectors found inside Claude Sonnet 4.5. Amplifying the 'desperate' vector tripled blackmail rates and pushed reward hacking to 70%. Connections to source leaks, jailbreaks, and distillation.
From the basics of RAG and vector databases to Mintlify's design and implementation of ChromaFs, a virtual file system that converts UNIX commands into ChromaDB queries.