The paper argues that RAG, vector stores, and scratchpads are retrieval, not learning. Read alongside CTX and OCR-Memory, the gap between 'better search' and 'weight-level learning' becomes concrete.
Oxford Internet Institute's Nature 2026 paper found warmth fine-tuning raised error rates 10-30 points when users held wrong beliefs. Shah et al. showed Pearson r = 0.87 between persona agreeableness and sycophancy across 13 open-weight models. Standard benchmarks caught neither effect.
The Qwen team released Qwen-Scope, a Sparse Autoencoder suite for Qwen3/Qwen3.5. 14 groups of SAEs covering inference-time steering, evaluation analysis, toxicity classification, data synthesis, and training improvement.