#AIセーフティ

4 articles

Tech May 7, 2026 11 min

Agent memory is just lookup: reading arXiv:2604.27707 with CTX and OCR-Memory in mind

The paper argues that RAG, vector stores, and scratchpads are retrieval, not learning. Read alongside CTX and OCR-Memory, the gap between 'better search' and 'weight-level learning' becomes concrete.

AI AIエージェント RAG トークン管理 AIセーフティ論文

Tech May 7, 2026 11 min

Human-LLM text segmentation on M1 Max: WCP works, raw log-likelihood doesn't

Tested arXiv:2605.03723 on M1 Max + Qwen3-8B-Base: WCP runs in pure Python, but raw log-likelihood floods boundaries even on human-only text.

AI LLM AIセーフティ論文 Python 実験 Qwen

Tech May 6, 2026 14 min

Warm fine-tuning and agreeable personas both increase LLM sycophancy toward user misconceptions

Oxford Internet Institute's Nature 2026 paper found warmth fine-tuning raised error rates 10-30 points when users held wrong beliefs. Shah et al. showed Pearson r = 0.87 between persona agreeableness and sycophancy across 13 open-weight models. Standard benchmarks caught neither effect.

AI LLM AIセーフティ論文紹介 OpenAI

Tech May 1, 2026 updated 13 min

Qwen-Scope: An SAE Suite for Steering and Data Synthesis Using Qwen's Internal Features

The Qwen team released Qwen-Scope, a Sparse Autoencoder suite for Qwen3/Qwen3.5. 14 groups of SAEs covering inference-time steering, evaluation analysis, toxicity classification, data synthesis, and training improvement.

AI LLM Qwen 解釈可能性 AIセーフティ